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Human Pancreas and Pancreatic Cancer Associated 
Gene Sequences and Polypeptides 

Field of the Invention 

This invention relates to newly identified pancreas or pancreatic cancer related 
polynucleotides and the polypeptides encoded by these polynucleotides herein collectively 
known as "pancreatic cancer antigens," and to the complete gene sequences associated 
therewith and to the expression products thereof, as well as the use of such pancreatic cancer 
antigens for detection, prevention and treatment of disorders of the pancreas, particularly the 
presence of pancreatic cancer. This invention relates to the pancreatic cancer antigens as well 
as vectors, host cells, antibodies directed to pancreatic cancer antigens and recombinant and 
synthetic methods for producing the same. Also provided are diagnostic methods for 
diagnosing and treating, preventing and/or prognosing disorders related to the pancreas, 
including pancreatic cancer, and therapeutic methods for treating such disorders. The 
invention further relates to screening methods for identifying agonists and antagonists of 
pancreatic cancer antigens of the invention. The present invention further relates to methods 
and/or compositions for inhibiting the production and/or function of the polypeptides of the 
present invention. 

Background of the Invention 

Cell growth is a carefully regulated process which responds to specific needs of the 
body. Occassionally, the intricate, and highly regulated controls dictating the rules for 
cellular division break down. When this occurs, the cell begins to grow and divide 
independently of its homeostatic regulation resulting in a condition commonly referred to as 
cancer. In fact, cancer is the second leading cause of death among Americans aged 25-44. 

Pancreatic cancer is one of the most dangerous cancers, killing half its victims within 
6 weeks and having a 5-year survival rate of only 1%. The diagnosis of pancreatic carcinoma 
is often associated with a poor prognosis, because most patients already have advanced 
disease. Despite the many advances reported during the past few years, pancreatic cancer 
remains a profound therapeutic challenge, it is hoped that the increasing knowledge of the 
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molecular biology of pancreatic carcinoma will lead to improvements in diagnosing, staging, 
and treating pancreatic adenocarcinoma (Brand et al., Curr Opin Oncol 10:362-6 (1998)). 

There is a need, therefore, for identification and characterization of factors that 
modulate activation and differentiation of pancreatic cells, both normally and in disease 
states. In particular, there is a need to isolate and characterize additional molecules that 
mediate apoptosis, DNA repair, tumor-mediated angiogenesis, genetic imprinting, immune 
responses to tumors and tumor antigens and, among other things, that can play a role in 
detecting, preventing, ameliorating or correcting dysfunctions or diseases related to the 
pancreas. 

Summary of the Invention 

The present invention includes isolated nucleic acid molecules comprising, or 
alternatively, consisting of, a pancreas and/or pancreatic cancer associated polynucleotide 
sequence disclosed in the sequence listing (as SEQ ID NOs: 1 to 459) and/or contained in a 
human cDNA clone described in Tables 1, 2 and 5 and deposited with the American Type 
Culture Collection ("ATCC"). Fragments, variant, and derivatives of these nucleic acid 
molecules are also encompassed by the invention. The present invention also includes 
isolated nucleic acid molecules comprising, or alternatively consisting of, a polynucleotide 
encoding a pancreas and/or pancreatic cancer polypeptide. The present invention further 
includes pancreas and/or pancreatic cancer polypeptides encoded by these polynucleotides. 
Further provided for are amino acid sequences comprising, or alternatively consisting of, 
pancreas and/or pancreatic cancer polypeptides as disclosed in the sequence listing (as SEQ 
ID NOs: 460 to 918) and/or encoded by a human cDNA clone described in Tables 1,2 and 5 
and deposited with the ATCC. Antibodies that bind these polypeptides are also encompassed 
by the invention. Polypeptide fragments, variants, and derivatives of these amino acid 
sequences are also encompassed by the invention, as are polynucleotides encoding these 
polypeptides and antibodies that bind these polypeptides. Also provided are diagnostic 
methods for diagnosing and treating, preventing, and/or prognosing disorders related to the 
pancreas, including pancreatic cancer, and therapeutic methods for treating such disorders. 
The invention further relates to screening methods for identifying agonists and antagonists of 
pancreatic cancer antigens of the invention. 
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Detailed Description 
Tables 

Table 1 summarizes some of the pancreatic cancer antigens encompassed by the 
5 invention (including contig sequences (SEQ ID NO:X) and the cDNA clone related to the 
contig sequence) and further summarizes certain characteristics of the pancreatic cancer 
polynucleotides and the polypeptides encoded thereby. The first column shows the "SEQ ID 
NO:" for each of the 459 pancreatic cancer antigen polynucleotide sequences of the 
invention. The second column provides a unique "Sequence/Contig ID" identification for 

10 each pancreas and/or pancreatic cancer associated sequence. The third column, "Gene 
Name," and the fourth column, "Overlap." provide a putative identification of the gene based 
on the sequence similarity of its translation product to an amino acid sequence found in a 
publicly accessible gene database and the database accession no. for the database sequence 
having similarity, respectively. The fifth and sixth columns provide the location (nucleotide 

15 position nos. within the contig), "Start" and "End", in the polynucleotide sequence "SEQ ID 
NO:X" that delineate the preferred ORE shown in the sequence listing as SEQ ID NO:Y. 
The seventh and eighth columns provide the "% Identity" (percent identity) and "% 
Similarity" (percent similarity), respectively, observed between the aligned sequence 
segments of the translation product of SEQ ID NO:X and the database sequence. The ninth 

20 column provides a unique "Clone ID" for a cDNA clone related to each contig sequence. 

Table 2 summarizes ATCC Deposits, Deposit dates, and ATCC designation numbers 
of deposits made with the ATCC in connection with the present application. 

Table 3 indicates public ESTs, of which at least one, two, three, four, five, ten, fifteen 
or more of any one or more of these public EST sequences are optionally excluded from 

25 certain embodiments of the invention. 

Table 4 lists residues comprising antigenic epitopes of antigenic epitope-bearing 
fragments present in most of the pancreas and/or pancreatic cancer associated 
polynucleotides described in Table I as predicted by the inventors using the algorithm of 
Jameson and Wolf, (1988) Comp. Appl. Biosci. 4:181-186. The Jameson-Wolf antigenic 

30 analysis was performed using the computer program PROTEAN (Version 3.1 1 for the Power 
Macintosh, DNASTAR, Inc., 1228 South Park Street Madison, Wl). Pancreas and pancreatic 
cancer associated polypeptides (e.g., SEQ ID NO:Y, polypeptides encoded by SEQ ID NO:X, 
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or polypeptides encoded by the cDNA in the referenced cDNA clone) may possess one or 
more antigenic epitopes comprising residues described in Table 4. It will be appreciated that 
depending on the analytical criteria used to predict antigenic determinants, the exact address 
of the determinant may vary slightly. The residues and locations shown in column two of 
Table 4 correspond to the amino acid sequences for most pancreas and/or pancreatic cancer 
associated polypeptide sequence shown in the Sequence Listing. 

Table 5 shows the cDNA libraries sequenced, and ATCC designation numbers and 
vector information relating to these cDNA libraries. 

Definitions 

The following definitions are provided to facilitate understanding of certain terms 
used throughout this specification. 

In the present invention, "isolated" refers to material removed from its original 
environment (e.g., the natural environment if it is naturally occurring), and thus is altered "by 
the hand of man" from its natural state. For example, an isolated polynucleotide could be part 
of a vector or a composition of matter, or could be contained within a cell, and still be 
"isolated" because that vector, composition of matter, or particular cell is not the original 
environment of the polynucleotide. The term "isolated" does not refer to genomic or cDNA 
libraries, whole cell total or mRNA preparations, genomic DNA preparations (including 
those separated by electrophoresis and transferred onto blots), sheared whole cell genomic 
DNA preparations or other compositions where the art demonstrates no distinguishing 
features of the polynucleotide/sequences of the present invention. 

As used herein, a "polynucleotide" refers to a molecule having a nucleic acid 
sequence contained in SEQ ID NO:X (as described in column 1 of Table 1) or the related 
cDNA clone (as described in column 9 of Table 1 and contained within a library deposited 
with the ATCC). For example, the polynucleotide can contain the nucleotide sequence of the 
full length cDNA sequence, including the 5' and 3' untranslated sequences, the coding region, 
as. well as fragments, epitopes, domains, and variants of the nucleic acid sequence. 
Moreover, as used herein, a "polypeptide" refers to a molecule having an amino acid 
sequence encoded by a polynucleotide of the invention as broadly defined (obviously 
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excluding poIy-Phenylalamne or poIy-Lysine peptide sequences which result from translation 
of a polyA tail of a sequence corresponding to a cDNA). 

In the present invention. "SEQ ID NO:X" was often generated by overlapping 
sequences contained in multiple clones (contig analysis). A representative clone containing 
all or most of the sequence for SEQ ID NO:X is deposited at Human Genome Sciences, Inc. 
(HGS) in a catalogued and archived library. As shown in column 9 of Table I, each clone is 
identified by a cDNA Clone ID. Each Clone ID is unique to an individual clone and the 
Clone ID is all the information needed to retrieve a given clone from the HGS library. In 
addition to the individual cDNA clone deposits, most of the cDNA libraries from which the 
clones were derived were deposited at the American Type Culture Collection (hereinafter 
"ATCC"). Table 5 provides a list of the deposited cDNA libraries. One can use the Clone ID 
to determine the library source by reference to Tables 2 and 5. Table 5 lists the deposited 
cDNA libraries by name and links each library to an ATCC Deposit. Library names contain 
four characters, for example, "HTWE." The name of a cDNA clone ("Clone ID") isolated 
from that library begins with the same four characters, for example "HTWEP07" As 
mentioned below, Table I correlates the Clone ID names with SEQ ID NOs. Thus, starting 
with a SEQ ID NO, one can use Tables !. 2 and 5 to determine the corresponding Clone ID, 
from which library it came and in which ATCC deposit the library is contained. Furthermore, 
it is possible to retrieve a given cDNA clone from the source library by technique's known in 
the art and described elsewhere herein. The ATCC is located at 10801 University Boulevard, 
Manassas, Virginia 201 10-2209, USA. The ATCC deposits were made persuant to the terms 
of the Budapest Treaty on the international recognition of the deposit of microorganisms for 
the purposes of patent procedure. 

A "polynucleotide" of the present invention also includes those polynucleotides 
capable of hybridizing, under stringent hybridization conditions, to sequences contained in 
SEQ ID NO:X, or the complement thereof (e.g., the complement of any one, two, three, four, 
or more of the polynucleotide fragments described herein), and/or sequences contained in the 
related cDNA clone within a library deposited with the ATCC. "Stringent hybridization 
conditions" refers to an overnight incubation at 42 degree C in a solution comprising 50% 
formamide, 5x SSC (750 mM NaCl, 75 mM trisodium citrate), 50 mM sodium phosphate (pH 
7.6), 5x Denhardt's solution, 10% dextran sulfate, and 20 ug/ml denatured, sheared salmon 
sperm DNA, followed by washing the filters in 0. lx SSC at about 65 degree C. 
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Also included within ''polynucleotides" of the present invention are nucleic acid 
molecules that hybridize to the polynucleotides of the present invention at lower stringency 
hybndization conditions. Changes in the stringency of hybridization and signal detection are 
primarily accomplished through the manipulation of formamide concentration (lower 
percentages of formamide result in lowered stringency); salt conditions, or temperature. For 
example, lower stringency conditions include an overnight incubation at 37 degree C in a 
solution comprising 6X SSPE (20X SSPE = 3M NaCl; 0.2M NaH 2 P0 4 ; 0.02M EDTA, pH 
7.4), 0.5% SDS. 30% formamide. 100 ug/ml salmon sperm blocking DNA; followed by 
washes at 50 degree C with 1XSSPE, 0.1% SDS. In addition, to achieve even lower 
stringency, washes performed following stringent hybridization can be done at higher salt 
concentrations (e.g. 5X SSC). 

Note that variations in the above conditions may be accomplished through the 
inclusion and/or substitution of alternate blocking reagents used to suppress background in 
hybridization experiments. Typical blocking reagents include Denhardt's reagent, BLOTTO, 
heparin, denatured salmon sperm DNA, and commercially available proprietary formulations. 
The inclusion of specific blocking reagents may require modification of the hybridization 
conditions described above, due to problems with compatibility. 

Of course, a polynucleotide which hybridizes only to polyA+ sequences (such as any 
3' terminal polyA+ tract of a cDNA shown in the sequence listing), or to. a complementary 
stretch of T (or U) residues, would not be included in the definition of "polynucleotide." since 
such a polynucleotide would hybridize to any nucleic acid molecule containing a poly (A) 
stretch or the complement thereof (e.g., practically any double-stranded cDNA clone 
generated using oligo dT as a primer). 

The polynucleotides of the present invention can be composed of any 
polyribonucleotide or polydeoxribonucleotide, which may be unmodified RNA or DNA or 
modified RNA or DNA. For example, polynucleotides can be composed of single- and 
double-stranded DNA, DNA that is a mixture of single- and double-stranded regions, single- 
and double-stranded RNA, and RNA that is mixture of single- and double-stranded regions, 
hybrid molecules comprising DNA and RNA that may be single-stranded or, more typically, 
double-stranded or a mixture of single- and double-stranded regions. In addition, the 
polynucleotide can be composed of triple-stranded regions comprising RNA or DNA or both 
RNA and DNA. A polynucleotide may also contain one or more modified bases or DNA or 
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RNA backbones modified for stability or for other reasons. "Modified" bases include, for 
example, tritylated bases and unusual bases such as inosine. A variety of modifications can 
be made to DNA and RNA; thus, "polynucleotide" embraces chemically, enzymatically, or 
metabolically modified forms. 

In specific embodiments, the polynucleotides of the invention are at least 15, at least 
30, at least 50, at least 100, at least 125, at least 500, or at least 1000 continuous nucleotides 
but are less than or equal to 300 kb. 200 kb. 100 kb, 50 kb, 15 kb, 10 kb, 7.5kb, 5 kb, 2.5 kb, 
2.0 kb, or 1 kb, in length. In a further embodiment, polynucleotides of the invention 
comprise a portion of the coding sequences, as disclosed herein, but do not comprise all or a 
portion of any intron. In another embodiment, the polynucleotides comprising coding 
sequences do not contain coding sequences of a genomic flanking gene (i.e., 5 1 or 3' to the 
gene of interest in the genome). In other embodiments, the polynucleotides of the invention 
do not contain the coding sequence of more than 1000, 500, 250, 100, 50, 25, 20 3 15, 10, 5, 4, 
3, 2, or 1 genomic flanking gene(s). 

"SEQ ID NO:X" refers to a pancreatic cancer antigen polynucleotide sequence 
described in Table I. SEQ ID NO:X is identified by an integer specified in column 1 of Table 
1. The polypeptide sequence SEQ ID NO:Y is a translated open reading frame (ORF) 
encoded by polynucleotide SEQ ID NO:X. There are 459 pancreatic cancer antigen 
polynucleotide sequences described in Table I and shown in the sequence listing (SEQ ID 
NO:l through SEQ ID NO:459). Likewise there are 459 polypeptide sequences shown in the 
sequence listing, one polypeptide sequence for each of the polynucleotide sequences (SEQ ID 
NO:460 through SEQ ID NO:9l8). The polynucleotide sequences are shown in the sequence 
listing immediately followed by all of the polypeptide sequences. Thus, a polypeptide 
sequence corresponding to polynucleotide sequence SEQ ID NO:l is the first polypeptide 
sequence shown in the sequence listing. The second polypeptide sequence corresponds to the 
polynucleotide sequence shown as SEQ ID NO:2, and so on. In otherwords, since there are 
459 polynucleotide sequences, for any polynucleotide sequence SEQ ID NO:X, a 
corresponding polypeptide SEQ ID NO:Y can be determined by the formula X + 459 = Y. In 
addition, any of the unique "Sequence/Contig ID" defined in column 2 of Table 1, can be 
linked to the corresponding polypeptide SEQ ID NO:Y by reference to Table 4. 

The polypeptides of the present invention can be composed of amino acids joined to 
each other by peptide bonds or modified peptide bonds, i.e., peptide isosteres, and may 
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contain amino acids other than the 20 gene-encoded amino acids. The polypeptides may be 
modified by either natural processes, such as posttranslational processing, or by chemical 
modification techniques which are well known in the art. Such modifications are well 
described in basic texts and in more detailed monographs, as well as in a voluminous research 
5 literature. Modifications can occur anywhere in a polypeptide, including the peptide 
backbone, the amino acid side-chains and the amino or carboxyl termini. It will be 
appreciated that the same type of modification may be present in the same or varying degrees 
at several sites in a given polypeptide. Also, a given polypeptide may contain many types of 
modifications. Polypeptides may be branched, for example, as a result of ubiquitination, and 
10 they may be cyclic, with or without branching. Cyclic, branched, and branched cyclic 
polypeptides may result from posttranslation natural processes or may be made by synthetic 
methods. Modifications include acetylation, acylation, ADP-ribosylation, amidation, 
covalent attachment of flavin, covalent attachment of a heme moiety, covalent attachment of 
a nucleotide or nucleotide derivative, covalent attachment of a lipid or lipid derivative, 

15 covalent attachment of phosphotidylinositol, cross-linking, cyclization, disulfide bond 
formation, demethylation, formation of covalent cross-links, formation of cysteine, formation 
of pyroglutamate, formylation, gamma-carboxylation, glycosylation, GPI anchor formation, 
hydroxylation, iodination, methylation, myristoylation, oxidation, pegylation, proteolytic 
processing, phosphorylation, prenylation, racemization, selenoylation, sulfation, transfer- 

20 RNA mediated addition of amino acids to proteins such as arginylation, and ubiquitination. 
(See, for instance, PROTEINS - STRUCTURE AND MOLECULAR PROPERTIES, 2nd 
Ed., T. E. Creighton, W. H. Freeman and Company, New York (1993); 
POSTTRANSLATIONAL COVALENT MODIFICATION OF PROTEINS, B. C. Johnson, 
Ed., Academic Press, New York, pgs. 1-12 (1983); Seifter et al., Meth Enzymol 182:626-646 

25 (1990); Rattan et al., Ann NY Acad Sci 663:48-62 (1992).) 

The pancreas and pancreatic cancer polypeptides of the invention can be prepared in 
any suitable manner. Such polypeptides include isolated naturally occurring polypeptides, 
recombinantly produced polypeptides, synthetically produced polypeptides, or polypeptides 
produced by a combination of these methods. Means for preparing such polypeptides are 

30 well understood in the an. 

The polypeptides may be in the form of the secreted protein, including the mature 
form, or may be a part of a larger protein, such as a fusion protein (see below). It is often 
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advantageous to include an additional amino acid sequence which contains secretory or 
leader sequences, pro-sequences, sequences which aid in purification, such as multiple 
histidine residues, or an additional sequence for stability during recombinant production. 

The pancreas and pancreatic cancer polypeptides of the present invention are 
5 preferably provided in an isolated form, and preferably are substantially purified. A 
recombinantly produced version of a polypeptide, including the secreted polypeptide, can be 
substantially purified using techniques described herein or otherwise known in the art, such 
as, for example, by the one-step method described in Smith and Johnson, Gene 67:31-40 
(1988). Polypeptides of the invention also can be purified from natural, synthetic or 
10 recombinant sources using techniques described herein or otherwise known in the an, such 
as, for example, antibodies of the invention raised against the polypeptides of the present 
invention in methods which are well known in the art. 

By a polypeptide demonstrating a "functional activity" is meant, a polypeptide 
capable of displaying one or more known functional activities associated with a full-length 

15 (complete) protein of the invention. Such functional activities include, but are not limited to, 
biological activity, antigenicity [ability to bind (or compete with a polypeptide for binding) 
to an anti-polypeptide antibody], immunogenicity (ability to generate antibody which binds to 
a specific polypeptide of the invention), ability to form multimers with polypeptides of the 
invention, and ability to bind to a receptor or ligand for a polypeptide. 

20 "A polypeptide having functional activity" refers to polypeptides exhibiting activity 

similar, but not necessarily identical to, an activity of a polypeptide of the present invention, 
including mature forms, as measured in a particular assay, such as, for example, a biological 
assay, with or without dose dependency. In the case where dose dependency does exist, it 
need not be identical to that of the polypeptide, but rather substantially similar to the dose- 

25 dependence in a given activity as compared to the polypeptide of the present invention (i.e., 
the candidate polypeptide will exhibit greater activity or not more than about 25-fold less 
and, preferably, not more than about tenfold less activity, and most preferably, not more than 
about three-fold less activity relative to the polypeptide of the present invention). 

The functional activity of the pancreatic cancer antigen polypeptides, and fragments, 

30 variants derivatives, and analogs thereof, can be assayed by various methods. 

For example, in one embodiment where one is assaying for the ability to bind or 
compete with full-length polypeptide of the present invention for binding to an antibody to 
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the full length polypeptide antibody, various immunoassays known in the art can be used, 
including but not limited to. competitive and non-competitive assay systems using techniques 
such as radioimmunoassays, ELISA (enzyme linked immunosorbent assay), "sandwich" 
immunoassays, immunoradiometric assays, gel diffusion precipitation reactions, 
immunodiffusion assays, in situ immunoassays (using colloidal gold, enzyme or radioisotope 
labels, for example), western blots, precipitation reactions, aggjutination assays (e.g., gel 
agglutination assays, hemagglutination assays), complement fixation assays, 
immunofluorescence assays, protein A assays, and immunoelectrophoresis assays, etc. In one 
embodiment, antibody binding is detected by detecting a label on the primary antibody. In 
another embodiment, the primary antibody is detected by detecting binding of a secondary 
antibody or reagent to the primary antibody. In a further embodiment, the secondary 
antibody is labeled. Many means are known in the art for detecting binding in an 
immunoassay and are within the scope of the present invention. 

In another embodiment, where a ligand is identified, or the ability of a polypeptide 
fragment, variant or derivative of the invention to multimerize is being evaluated, binding can 
be assayed, e.g., by means well-known in the an, such as, for example, reducing and non- 
reducing gel chromatography, protein affinity chromatography, and affinity blotting. See 
generally, Phizicky, E., et al., Microbiol. Rev. 59:94-123 (1995). In another embodiment, 
physiological correlates polypeptide of the present invention binding to its substrates (signal 
transduction) can be assayed. 

In addition, assays described herein (see Examples) and otherwise known in the art 
may routinely be applied to measure the ability of polypeptides of the present invention and 

fragments, variants derivatives and analogs thereof to elicit polypeptide related biological 

- ■- f. 

activity (either in vitro or in vivo). Other methods will be known to the skilled artisan and 
are within the scope of the invention. 

Pancreas and Pancreatic Cancer Associated Polynucleotides and Polypeptides of the 
Invention 

It has been discovered herein that the polynucleotides described in Table 1 are 
expressed at significantly enhanced levels in human pancreas and/or pancreatic cancer 
tissues. Accordingly, such polynucleotides, polypeptides encoded by such polynucleotides, 
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and antibodies specific for such polypeptides find use in the prediction, diagnosis, prevention 
and treatment of pancreas related disorders, including pancreatic cancer as more fully 
described below. 

Table 1 summarizes some of the polynucleotides encompassed by the invention 
(including contig sequences (SEQ ID NO:X) and the related cDNA clones) and further 
summarizes certain characteristics of these pancreas and/or pancreatic cancer associated 
polynucleotides and the polypeptides encoded thereby. 
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The first column of Table 1 shows the "SEQ ID NO:" for each of the 459 pancreatic 
cancer antigen polynucleotide sequences of the invention. 

The second column in Table 1, provides a unique "Sequence/Contig ID" identification 
for each pancreas and/or pancreatic cancer associated sequence. The third column in Table 1 , 

5 "Gene Name/' provides a putative identification of the gene based on the sequence similarity 
of its translation product to an amino acid sequence found in a publicly accessible gene 
database, such as GenBank (NCBI). The great majority of the cDNA sequences reported in 
Table 1 are unrelated to any sequences previously described in the literature. The fourth 
column, in Table 1, "Overlap," provides the database accession no. for the database sequence 

10 having similarity. The fifth and sixth columns in Table 1 provide the location (nucleotide 
position nos. within the contig), "Start" and "End", in the polynucleotide sequence "SEQ ID 
NO:X" that delineate the preferred ORF shown in the sequence listing as SEQ ID NO:Y. In 
one embodiment, the invention provides a protein comprising, or alternatively consisting of, a 
polypeptide encoded by the portion of SEQ ID NO:X delineated by the nucleotide position 

15 nos. "Stan" and "End". Also provided are polynucleotides encoding such proteins and the 
complementary strand thereto. The seventh and eighth columns provide the "% Identity" 
(percent identity) and "% Similarity" (percent similarity) observed between the aligned 
sequence segments of the translation product of SEQ ID NO:X and the database sequence. 

The ninth column of Table 1 provides a unique "Clone ID" for a clone related to each 

20 contig sequence. This clone ID references the cDNA clone which contains at least the 5 ? most 
sequence of the assembled contig and at least a portion of SEQ ID NO:X was determined by 
directly sequencing the referenced clone. The reference clone may have more sequence than 
described in the sequence listing or the clone may have less. In the vast majority of cases, 
however, the clone is believed to encode a full-length polypeptide. In the case where a clone 

25 is not full-length, a full-length cDNA can be obtained by methods described elsewhere 
herein. 

Table 3 indicates public ESTs, of which at least one, two, three, four, five, ten, or 
more of any one or more of these public ESTs are optionally excluded from the invention. 

SEQ ID NO:X (where X may be any of the polynucleotide sequences disclosed in the 
30 sequence listing as SEQ ID NO: 1 through SEQ ID NO:459) and the translated SEQ ID NO:Y 
(where Y may be any of the polypeptide sequences disclosed in the sequence listing as SEQ 
ID NO:460 through SEQ ID NO:918) are sufficiently accurate and otherwise suitable for a 
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variety of uses well known in the art and decribed further below. For instance. SEQ ID 
NO:X has uses including, but not limited to, in designing nucleic acid hybridization probes 
that will detect nucleic acid sequences contained in SEQ ID NO:X or the related cDNA clone 
contained in a library deposited with the ATCC. These probes will also hybridize to nucleic 
5 acid molecules in biological samples, thereby enabling immediate applications in 
chromosome mapping, linkage analysis, tissue identification and/or typing, and a variety of 
forensic and diagnostic methods of the invention. Similarly, polypeptides identified from 
SEQ ID NO:Y have uses that include, but are not limited to, generating antibodies which 
bind specifically to the pancreatic cancer antigen polypeptides, or fragments thereof, and/or 
10 to the pancreatic cancer antigen polypeptides encoded by the cDNA clones identified in 
Table I. 

Nevertheless, DNA sequences generated by sequencing reactions can contain 
sequencing errors. The errors exist as misidentified nucleotides, or as insertions or deletions 
of nucleotides in the generated DNA sequence. The erroneously inserted or deleted 
15 nucleotides cause frame shifts in the reading frames of the predicted amino acid sequence. In 
these cases, the predicted amino acid sequence diverges from the actual amino acid sequence, 
even though the generated DNA sequence may be greater than 99.9% identical to the actual 
DNA sequence (for example, one base insertion or deletion in an open reading frame of over 
1000 bases). 

20 Accordingly, for those applications requiring precision in the nucleotide sequence or 

the amino acid sequence, the present invention provides not only the generated nucleotide 
sequence identified as SEQ ID NO:X, the predicted translated amino acid sequence identified 
as SEQ ID NO:Y, but also a sample of plasmid DNA containing the related cDNA clone 
(deposited with the ATCC, as set forth in Table 1). The nucleotide sequence of each 

25 deposited clone can readily be determined by sequencing the deposited clone in accordance 
with known methods. Further, techniques known in the art can be used to verify the 
nucleotide sequences of SEQ ID NO:X. 

The predicted amino acid sequence can then be verified from such deposits. 
Moreover, the amino acid sequence of the protein encoded by a particular clone can also be 

30 directly determined by peptide sequencing or by expressing the protein in a suitable host cell 
containing the deposited human cDNA, collecting the protein, and determining its sequence. 
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The present invention also relates to vectors or plasmids which include such DNA 
sequences, as well as the use of the DNA sequences. The material deposited with the ATCC 
on: 



5 Table 2 



ATCC Deposits 


Deposit Date 


ATCC Designation Number 


LP01, LP02. LP03. LP04, 
LP05, LP06, LP07, LP08, 
LP09, LP 10, LP1I, 


May-20-97 


209059, 209060, 209061, 209062, 209063, 
209064, 209065, 209066, 209067, 209068, 
209069 


LP12 


Jan-12-98 


209579 


LP13 


Jan- 1 2-98 


209578 


LP14 . 


Jui-16-98 


203067 


LP15 


Jul- 16-98 


203068 


LP16 


Feb- 1-99 


203609 


LP17 


Feb- 1 -99 


203610 


LP20 


Nov- 17-98 


203485 


LP21 


Jun- 18-99 


PTA-252 


LP22 


Jun- 18-99 


PTA-253 


LP23 


Dec-22-99 


PTA-1081 



each is a mixture of cDNA clones derived from a variety of human tissue and cloned in either 
a plasmid vector or a phage vector, as shown in Table 5. These deposits are referred to as 
"the deposits" herein. The tissues from which the clones were derived are listed in Table 5, 

10 and the vector in which the cDNA is contained is also indicated in Table 5. The deposited 
material includes the cDNA clones which were partially sequenced and are related to the 
SEQ ID NO:X described in Table 1 (column 9). Thus, a clone which is isolatable from the 
ATCC Deposits by use of a sequence listed as SEQ ID NO:X may include the entire coding 
region of a human gene or in other cases such clone may include a substantial ponion of the 

15 coding region of a human gene. Although the sequence listing lists only a portion of the 
DNA sequence in a clone included in the ATCC Deposits, it is well within the ability of one 
skilled in the art to complete the sequence of the DNA included in a clone isolatable from the 
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ATCC Deposits by use of a sequence (or portion thereof) listed in Table 1 by procedures 
hereinafter further described, and others apparent to those skilled in the art. 

Also provided in Table 5 is the name of the vector which contains the cDNA clone. 
Each vector is routinely used in the art. The following additional information is provided for 
5 convenience. 

Vectors Lambda Zap (U.S. Patent Nos. 5,128,256 and 5,286,636), Uni-Zap XR (U.S. 
Patent Nos. 5,128, 256 and 5,286,636), Zap Express (U.S. Patent Nos. 5,128,256 and 
5.286,636), pBluescript (pBS) (Short, J. M. et al., Nucleic Acids Res. 75:7583-7600 (1988); 
Alting-Mees, M. A. and Short. J. M., Nucleic Acids Res. / 7:9494 (1989)) and pBK (Alting- 

10 Mees. M. A. et al.. Strategies 5/58-61 (1992)) are commercially available from Stratagene 
Cloning Systems, Inc.. 1 101 1 N. Torrey Pines Road, La Jolla, CA, 92037. pBS contains an 
ampicillin resistance gene and pBK. contains a neomycin resistance gene. Phagemid pBS 
may be excised from the Lambda Zap and Uni-Zap XR vectors, and phagemid pBK. may be 
excised from the Zap Express vector. Both phagemids may be transformed into E. coli strain 

15 XL-1 Blue, also available from Stratagene. 

Vectors pSportl, pCMVSport 1.0, pCMVSport 2.0 and pCMVSport 3.0, were 
obtained from Life Technologies, Inc., P. O. Box 6009, Gaithersburg, MD 20897. All Sport 
vectors contain an ampicillin resistance gene and may be transformed into E. coli strain 
DH10B. also available from Life Technologies. See, for instance, Gruber, C. E., et al., Focus 

20 15:59 (1993). Vector lafmid BA (Bento Soares, Columbia University, New York, NY)* 
contains an ampicillin resistance gene and can be transformed into E. coli strain XL-1 Blue. 
Vector pCR*2.l, which is available from Invitrogen, 1600 Faraday Avenue, Carlsbad, CA 
92008, contains an ampicillin resistance gene and may be transformed into E. coli strain 
DH10B, available from Life Technologies. See, for instance, Clark, J. M., Nuc. Acids Res. 

25 76:9677-9686 (1988) and Mead, D. et al. Bio/Technology 9: (1991). 

The present invention also relates to the genes corresponding to SEQ ID NO:X, SEQ 
ID NO:Y, and/or the cDNA contained in a deposited cDNA clone. The corresponding gene 
can be isolated in accordance with known methods using the sequence information disclosed 
herein. Such methods include, but are not limited to, preparing probes or primers from the 

30 disclosed sequence and identifying or amplifying the corresponding gene from appropriate 
sources of genomic material. 
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Also provided in the present invention are allelic variants, orthologs, and/or species 
homologs. Procedures known in the art can be used to obtain full-length genes, allelic 
variants, splice variants, full-length coding portions, orthologs, and/or species homologs of 
genes corresponding to SEQ ID NO:X, SEQ ID NO:Y, and/or the cDNA contained in the 
5 related cDNA clone in the deposit, using information from the sequences disclosed herein or 
the clones deposited with the ATCC. For example, allelic variants and/or species homologs 
may be isolated and identified by making suitable probes or primers from the sequences 
provided herein and screening a suitable nucleic acid source for allelic variants and/or the 
desired homologue. 

10 The present invention provides a polynucleotide comprising, or alternatively 

consisting of, the nucleic acid sequence of SEQ ID NO:X, and/or the related cDNA clone 
(See. e.g., columns 1 and 9 of Table 1). The present invention also provides a polypeptide 
comprising, or alternatively, consisting of. the polypeptide sequence of SEQ ID NO:Y, a 
polypeptide encoded by SEQ ID NO:X, and/or a polypeptide encoded by the cDNA in the 

15 related cDNA clone contained in a deposited library. Polynucleotides encoding a polypeptide 
comprising, or alternatively consisting of, the polypeptide sequence of SEQ ID NO:Y, a 
polypeptide encoded by SEQ ID NO:X ; and/or a polypeptide encoded by the the dDNA in the 
related cDNA clone contained in a deposited library, are also encompassed by the invention. 
The present invention further encompasses a polynucleotide comprising, or alternatively 

20 consisting of, the complement of the nucleic acid sequence of SEQ ID NO:X. and/or the 
complement of the coding strand of the related cDNA clone contained in a deposited library. 

Many polynucleotide sequences, such as EST sequences, are publicly available and 
accessible through sequence databases and may have been publicly available prior to 
conception of the present invention. Preferably, such related polynucleotides are specifically 

25 excluded from the scope of the present invention. To list every related sequence would 
unduly burden the disclosure of this application. Accordingly, for each "Contig Id" listed in 
the first column of Table 3, preferably excluded are one or more polynucleotides comprising 
a nucleotide sequence described in the second column of Table 3 by the general formula of a- 
b, each of which are uniquely defined for the SEQ ID NO;X corresponding to that Contig Id 

30 in Table 1 . Additionally, specific embodiments are directed to polynucleotide sequences 
excluding at least one, two, three, four, five, ten, or more of the specific polynucleotide 
sequences referenced by the Genbank Accession No. for each Contig Id which may be 
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included in column 3 of Table 3. In no way is this listing meant to encompass all of the 
sequences which may be excluded by the general formula, it is just a representative example. 
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Table 3. 



Sequence/ 
Contig ID 


General formula 


Genbank Accession No. 


456379 


Preferably excluded from the presenc invention are 
one or more polynucleotides comprising a nucleotide 
sequence described by the general formula of a-b, 
where a is any integer between 1 to 55 1 of SEQ ID 
NO:l. b is an integer of 15 to 565, where both a and 
b correspond to the positions of nucleotide residues 
shown in SEQ ID NO: 1 . and where b is greater than 
or equal to a + 14. 


R34554. AA018972 ; AA055489 


462108 


Preferably excluded from the present invention are 
one or more polynucleotides comprising a nucleotide 
sequence described by the general formula or a-b, 
where a is any integer between I to 1677 of SEQ ID 
NO:2. b is an integer of 15 to 1691. where both a and 
b correspond to the positions of nucleotide residues 
shown in SEQ ID NO:2, and where b is greater than 
nr equal to a + 14. 


T79903. R46289. R73001 . R73606, 
N30140. N35752. W32520, W32636, 

AAnifi£7^ AAftdO^flfl 

AA040683. AA070495. AA07038I, 
AA083072, AA 134451. AA207060, 
AA207086 


503446 


Preferably excluded from the presenc invention are 
one or more polynucleotides comprising a nucleotide 
sequence described by the general formula ot a-b, 
where a is any integer between 1 to 466 of SEQ ID 
NO:3. b is an integer of 1 5 to 480. where both a and 
b correspond to the positions of nucleotide residues 
shown in SEQ ID NO:3, and where b is greater than 
or equal to a +■ 14. 




507841 


Preferably excluded from the present invention are 
one or more polynucleotides comprising a nucleotide 
sequence described by the general formula ot a-b, 
where a is any integer between 1 to 594 of SEQ ID 
MO:4. b is an integer of 1 5 to 608. where both a and 
b correspond to the positions of nucleotide residues 
shown in SEQ ID NO:4, and where b is greater than 
or equal to a + 14. 


R12126.R14285 


509287 


Preferably excluded from the present invention are 
one or more polynucleotides comprising a nucleotide 
sequence described by the general tormula oi a-o, 
where a is any integer between 1 to 682 of SEQ ID 
NO:5, b is an integer of 1 5 to 696, where both a and 
b correspond to the positions of nucleotide residues 
shown in SEQ ID NO:5, and where b is greater than 

r\r frs si 4- 1 a 
ul CCJUal lOd ~ i t. 


H01699. H94037. N30572, N57219, 
N64393, N92189. AA035664, 

AA056367, AAl 15587 


509672 


Preferably excluded from the present invention are 
one or more polynucleotides comprising a nucleotide 
sequence described by the general formula of a-b. 
where a is any integer between I to 278 of SEQ ID 
NO:6. b is an integer of 1 5 to 292, where both a and 
b correspond to the positions of nucleotide residues 
shown in SEQ ID NO:6, and where b is greater than 
or equal to a + 14. 




509673 


Preferably excluded from the present invention are 
one or more polynucleotides comprising a nucleotide 
sequence described by the general formula of a-b, 
where a is any integer berween 1 to 348 of SEQ ID 
NO:7, b is an inteeer of 1 5 to 362. where both a and 
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sequence described by the genera! formula ol a-b, 
where a is any integer between 1 to 606 of SEQ ID 
NO.303. b is an integer of 1 5 to 620. where both a 
and b correspond to the positions of nucleotide 
residues shown in SEQ ID NO;303. and where b is 
creator than or equal to a + 14. 


H99806. H998I3. AA172251, 
AA468699. AA659754. AA808925. 
AA837298. AA8581 IU. AA8o4/ZJ, 
AA954263.FI 81 15. N99864 


831558 


Preferably excluded from the present invention are 
one or more polynucleotides comprising a nucleotide 
sequence described by the general formula of a-b. 
where a is any integer between 1 to 5 1 9 of SEQ ID 
NO:304, b is an integer of 15 to 533 ; where both a 
and b correspond to the positions of nucleotide 
residues shown in SEQ ID NO:304, and where b is 
urcatcr than or equal to a + 14. 


H60157, W57916. W57917. AA056029, 
AA056047. AA 142858. AA211887, 
AA469104, AA659257. AA662867, 
AA665372. AA728846. AA933045, 
F17890. AA090265 


831847 


Preferably excluded from the present invention are 
one or more polynucleotides comprising a nucleotide 
sequence described by the general formula of a-b, 
where a is any integer between 1 to 1360 of SEQ ID 
NO:305. bis an integer of 15 to 1374. where both a 
and b correspond to the positions of nucleotide 
residues shown in SEQ ID NO:305, and where b is 
ereatcr than or equal to a + 14. 




831893 


Preferably excluded from the present invention are 
one or more polynucleotides comprising a nucleotide 
sequence described by the general formula of a-b. 
where a is any integer between 1 to 654 of SEQ ID 
NO:306, b is an integer of 15 to 668. where both a 
and b correspond to the positions of nucleotide 
residues shown in SEQ ID NO:306. and where b is 
greater than or equal to a + 14. 




831903 


Preferably excluded from the present invention are 
one or more polynucleotides comprising a nucleotide 
sequence described by the general formula of a-b. 
where a is any integer between 1 to 1032 of SEQ ID 
NO:307. b is an integer of 1 5 to 1 046. where both a 
and b correspond to the positions of nucleotide 
residues shown in SEQ ID NO:307. and where b is 
greater than or equal to a + 14. 




831921 


Preferably excluded from the present invention are 
one or more polynucleotides comprising a nucleotide 
sequence described by the general formula of a-b. 
where a is any integer between 1 to 1672 of SEQ ID 
NO:308. b is an integer of 15 to 1686. where both a 
and b correspond to the positions of nucleotide 
residues shown in SEQ ID NO:308, and where b is 
Greater tnan or equai roa T 


H52554. H66743, H71667. N32238, 
N77727, W19857, AA0171I1. 
AA074918, AA235917, AA236708 


831923 


Preferably excluded from the present invention are 
one or more polynucleotides comprising a nucleotide 
sequence described by the general formula of a-b. 
where a is any integer between 1 to 1412 of SEQ ID 
NO:309. b is an integer of 15 to 1426, where both a 
and b correspond to the positions of nucleotide 
residues shown in SEQ ID NO:309, and where b is 
greater than or equal to a + 14. 




831959 


Preferably excluded from the present invention are 
one or more polynucleotides comprising a nucleotide 
sequence described bv the general formula of a-b. 
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Polynucleotide and Polypeptide Variants 

The present invention is directed to variants of the polynucleotide sequence disclosed 
in SEQ ID NO:X or the complementary strand thereto, and/or the cDNA sequence contained 
in a cDNA clone contained in the deposit. 
5 The present invention also encompasses variants of the pancreas and pancreatic 

cancer polypeptide sequence disclosed in SEQ ID NO:Y, a polypeptide sequence encoded by 
the polynucleotide sequence in SEQ ID NO:X. and/or a polypeptide sequence encoded by the 
cDNA in the related cDNA clone contained in the deposit. 

"Variant" refers to a polynucleotide or polypeptide differing from the polynucleotide 

10 or polypeptide of the present invention, but retaining essential properties thereof. Generally, 
variants are overall closely similar, and, in many regions, identical to the polynucleotide or 
polypeptide of the present invention. 

The present invention is also directed to nucleic acid molecules which comprise, or 
alternatively consist of, a nucleotide sequence which is at least 80%, 85%, 90%, 95%, 96%, 

15 97%, 98%, 99% or 100%, identical to, for example, the nucleotide coding sequence in SEQ 
ID NO:X or the complementary strand thereto, the nucleotide coding sequence of the related 
cDNA contained in a deposited library or the complementary strand thereto, a nucleotide 
sequence encoding the polypeptide of SEQ ID NO:Y, a nucleotide sequence encoding a 
polypeptide sequence encoded by the nucleotide sequence in SEQ ID NO:X, a nucleotide 

20 sequence encoding the polypeptide encoded by the cDNA in the related cDNA contained in a 
deposited library, and/or polynucleotide fragments of any of these nucleic acid molecules 
(e.g., those fragments described herein). Polypeptides encoded by these nucleic acid 
molecules are also encompassed by the invention. In another embodiment, the invention 
encompasses nucleic acid molecules which comprise or alternatively consist of, a 

25 polynucleotide which hybridizes under stringent hybridization conditions, or alternatively, 
under low stringency conditions, to the nucleotide coding sequence in SEQ ID NO:X, the 
nucleotide coding sequence of the related cDNA clone contained in a deposited library, a 
nucleotide sequence encoding the polypeptide of SEQ ID NO:Y, a nucleotide sequence 
encoding a polypeptide sequence encoded by the nucleotide sequence in SEQ ID NO:X, a 

30 nucleotide sequence encoding the polypeptide encoded by the cDNA in the related cDNA 
clone contained in a deposited library, and/or polynucleotide fragments of any of these 
nucleic acid molecules (e.g., those fragments described herein). Polynucleotides which 
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hybridize to the complement of these nucleic acid molecules under stringent hybridization 
conditions or alternatively, under lower stringency conditions, are also encompassed by the 
invention, as are polypeptides encoded by these polynucleotides. 

The present invention is also directed to polypeptides which comprise, or alternatively 
5 consist of, an amino acid sequence which is at least 80%, 85%, 90%, 95%, 96%, 97%, 98%, 
99% or 100% identical to, for example, the polypeptide sequence shown in SEQ ID NO:Y, a 
polypeptide sequence encoded by the nucleotide sequence in SEQ ID NO:X, a polypeptide 
sequence encoded by the cDNA in the related cDNA clone contained in a deposited library, 
and/or polypeptide fragments of any of these polypeptides (e.g., those fragments described 

10 herein). Polynucleotides which hybridize to the complement of the nucleic acid molecules 
encoding these polypeptides under stringent hybridization conditions, or alternatively, under 
lower stringency conditions, are also encompassed by the invention, as are polypeptides 
encoded by these polynucleotides. 

By a nucleic acid having a nucleotide sequence at least, for example, 95% "identical" 

15 to a reference nucleotide sequence of the present invention, it is intended that the nucleotide 
sequence of the nucleic acid is identical to the reference sequence except that the nucleotide 
sequence may include up to five point mutations per each 100 nucleotides of the reference 
nucleotide sequence encoding the polypeptide. In other words, to obtain a nucleic acid 
having a nucleotide sequence at least 95% identical to a reference nucleotide sequence, up to 

20 5% of the nucleotides in the reference sequence may be deleted or substituted with another 
nucleotide, or a number of nucleotides up to 5% of the total nucleotides in the reference 
sequence may be inserted into the reference sequence. The query sequence may be, for 
example, an entire sequence referred to in Table 1, an ORF (open reading frame), or any 
fragment specified as described herein. 

25 As a practical matter, whether any particular nucleic acid molecule or polypeptide is 

at least 80%, 85%, 90%, 95%, 96%, 97%, 98% or 99% identical to a nucleotide sequence of 
the present invention can be determined conventionally using known computer programs. A 
preferred method for determining the best overall match between a query sequence (a 
sequence of the present invention) and a subject sequence, also referred to as a global 

30 sequence alignment, can be determined using the FASTDB computer program based on the 
algorithm of Brutlag et al. (Comp. App. Biosci. 6:237-245 (1990)). In a sequence alignment 
the query and subject sequences are both DNA sequences. An RNA sequence can be 
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compared by converting U*s to T's. The result of said global sequence alignment is in 
percent identity. Preferred parameters used in a FASTDB alignment of DNA sequences to 
calculate percent identiy are: Matrix=Unitary, k-tuple=4, Mismatch Penalty=l, Joining 
Penalty=30, Randomization Group Length=0. Cutoff Score=l, Gap Penalty=5, Gap Size 
5 Penalty 0.05, Window Size=500 or the lenght of the subject nucleotide sequence, whichever 
is shorter. 

If the subject sequence is shorter than the query sequence because of 5' or 3' 
deletions, not because of internal deletions, a manual correction must be made to the results. 
This is because the FASTDB program does not account for 5' and 3' truncations of the 

10 subject sequence when calculating percent identity. For subject sequences truncated at the 5' 
or 3' ends, relative to the query sequence, the percent identity is corrected by calculating the 
number of bases of the query sequence that are 5' and 3' of the subject sequence, which are 
not matched/aligned, as a percent of the total bases of the query sequence. Whether a 
nucleotide is matched/aligned is determined by results of the FASTDB sequence alignment. 

15 This percentage is then subtracted from the percent identity, calculated by the above 
FASTDB program using the specified parameters, to arrive at a final percent identity score. 
This corrected score is what is used for the purposes of the present invention. Only bases 
outside the 5' and 3' bases of the subject sequence, as displayed by the FASTDB alignment, 
which are not matched/aligned with the query sequence, are calculated for the purposes of 

20 manually adjusting the percent identity score. 

For example, a 90 base subject sequence is aligned to a 100 base query sequence to 
determine percent identity. The deletions occur at the 5' end of the subject sequence and 
therefore, the FASTDB alignment does not show a matched/alignment of the first 1 0 bases at 
5' end. The 10 unpaired bases represent 10% of the sequence (number of bases at the 5' and 

25 3* ends not matched/total number of bases in the query sequence) so 10% is subtracted from 
the percent identity score calculated by the FASTDB program. If the remaining 90 bases 
were perfectly matched the final percent identity would be 90%. In another example, a 90 
base subject sequence is compared with a 100 base query sequence. This time the deletions 
are internal deletions so that there are no bases on the 5' or 3' of the subject sequence which 

30 are not matched/aligned with the query. In this case the percent identity calculated by 
FASTDB is not manually corrected. Once again, only bases 5' and 3' of the subject sequence 
which are not matched/aligned with the query sequence are manually corrected for. No other 
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manual corrections are to made for the purposes of the present invention. 

By a polypeptide having an amino acid sequence at least, for example, 95% 
"identical" to a query amino acid sequence of the present invention, it is intended that the 
amino acid sequence of the subject polypeptide is identical to the query sequence except that 
5 the subject polypeptide sequence may include up to five amino acid alterations per each 100 
amino acids of the query amino acid sequence. In other words, to obtain a polypeptide 
having an amino acid sequence at least 95% identical to a query amino acid sequence, up to 
5% of the amino acid residues in the subject sequence may be inserted, deleted, (indels) or 
substituted with another amino acid. These alterations of the reference sequence may occur 

10 at the amino or carboxy terminal positions of the reference amino acid sequence or anywhere 
between those terminal positions, interspersed either individually among residues in the 
reference sequence or in one or more contiguous groups within the reference sequence. 

As a practical matter, whether any particular polypeptide is at least 80%, 85%, 90%, 
95%, 96%, 97%, 98% or 99% identical to, for instance, the amino acid sequence in SEQ ID 

15 NO:Y or a fragment thereof, the amino acid sequence encoded by the nucleotide sequence in 
SEQ ID NO:X or a fragment thereof, or the amino acid sequence encoded by the cDNA in 
the related cDNA clone contained in a deposited library, or a fragment thereof, can be 
determined conventionally using known computer programs. A preferred method for 
determing the best overall match between a query sequence (a sequence of the present 

20 invention) and a subject sequence, also referred to as a global sequence alignment can be 
determined using the FASTDB computer program based on the algorithm of Brutlag et al. 
(Comp. App. Biosci. 6:237- 245(1990)). In a sequence alignment the query and subject 
sequences are either both nucleotide sequences or both amino acid sequences. The result of 
said global sequence alignment is in percent identity. Preferred parameters used in a 

25 FASTDB amino acid alignment are: Matrix=PAM 0, k-tuple=2, Mismatch Penalty=l, 
Joining Penalty=20, Randomization Group Length=0, Cutoff Score=l, Window 
Size=sequence length, Gap Penalty=5, Gap Size Penalty=0.05, Window Size=500 or the 
length of the subject amino acid sequence, whichever is shorter. 

If the subject sequence is shorter than the query sequence due to N- or C-terminal 

30 deletions, not because of internal deletions, a manual correction must be made to the results. 
This is because the FASTDB program does not account for N- and C-terminal truncations of 
the subject sequence when calculating global percent identity. For subject sequences 



10/22/2002, EAST Version: 1.03.0007 



WO 00/55320 



PCT/US00/05989 



truncated at the N- and C-termini, relative to the query sequence, the percent identity is 
corrected by calculating the number of residues of the query sequence that are N- and C- 
terminal of the subject sequence, which are not matched/aligned with a corresponding subject 
residue, as a percent of the total bases of the query sequence. Whether a residue is 
5 matched/aligned is determined by results of the FASTDB sequence alignment. This 
percentage is then subtracted from the percent identity, calculated by the above FASTDB 
program using the specified parameters, to arrive at a final percent identity score. This final 
percent identity score is what is used for the purposes of the present invention. Only residues 
to the N- and C-termini of the subject sequence., which are not matched/aligned with the 
10 query sequence, are considered for the purposes of manually adjusting the percent identity 
score. That is, only query residue positions outside the farthest N- and C- terminal residues 
of the subject sequence. 

For example, a 90 amino acid residue subject sequence is aligned with a 100 residue 
query sequence to determine percent identity. The deletion occurs at the N-terminus of the 

15 subject sequence and therefore, the FASTDB alignment does not show a matching/alignment 
of the first 10 residues at the N-terminus. The 10 unpaired residues represent 10% of the 
sequence (number of residues at the N- and C- termini not matched/total number of residues 
in the query sequence) so 10% is subtracted from the percent identity score calculated by the 
FASTDB program. If the remaining 90 residues were perfectly matched the final percent 

20 identity would be 90%. In another example, a 90 residue subject sequence is compared with 
a 100 residue query sequence. This time the deletions are internal deletions so there are no 
residues at the N- or C-termini of the subject sequence which are not matched/aligned with 
the query. In this case the percent identity calculated by FASTDB is not manually corrected. 
Once again, only residue positions outside the N- and C-terminal ends of the subject 

25 sequence, as displayed in the FASTDB alignment, which are not matched/aligned with the 
query sequence are manually corrected for. No other manual corrections are to made for the 
purposes of the present invention. 

The variants may contain alterations in the coding regions, non-coding regions, or 
both. Especially preferred are polynucleotide variants containing alterations which produce 

30 silent substitutions, additions, or deletions, but do not alter the properties or activities of the 
encoded polypeptide. Nucleotide variants produced by silent substitutions due to the 
degeneracy of the genetic code are preferred. Moreover, variants in which less than 50, less 
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than 40, less than 30, less than 20, less than 10, or 5-50, 5-25, 5-10, 1-5, or 1-2 amino acids 
are substituted, deleted, or added in any combination are also preferred. Polynucleotide 
variants can be produced for a variety of reasons, e.g., to optimize codon expression for a 
particular host (change codons in the human raRNA to those preferred by a bacterial host 
5 such as E. coli). 

Naturally occurring variants are called "allelic variants," and refer to one of several 
alternate forms of a gene occupying a given locus on a chromosome of an organism. (Genes 
II, Lewin, B. : ed., John Wiley & Sons, New York (1985).) These allelic variants can vary at 
either the polynucleotide and/or polypeptide level and are included in the present invention. 

10 Alternatively, non-naturally occurring variants may be produced by mutagenesis techniques 
or by direct synthesis. 

Using known methods of protein engineering and recombinant DNA technology, 
variants may be generated to improve or alter the characteristics of the polypeptides of the 
present invention. For instance, as discussed herein, one or more amino acids can be deleted 

15 from the N-terminus or C-terminus of the polypeptide of the present invention without 
substantial loss of biological function. The authors of Ron et al., J. Biol. Chem. 268: 2984- 
2988 (1993), reported variant KGF proteins having heparin binding activity even after 
deleting 3, 8, or 27 amino-terminal amino acid residues. Similarly, Interferon gamma 
exhibited up to ten times higher activity after deleting 8-10 amino acid residues from the 

20 carboxy terminus of this protein. (Dobeli et al., J. Biotechnology 7:199-216 (1988).) 

Moreover, ample evidence demonstrates that variants often retain a biological activity 
similar to that of the naturally occurring protein. For example, Gayle and coworkers (J. Biol. 
Chem 268:22105-221 1 1 (1993)) conducted extensive mutational analysis of human cytokine 
IL-la. They used random mutagenesis to generate over 3,500 individual IL-la mutants that 

25 averaged 2.5 amino acid changes per variant over the entire length of the molecule. Multiple 
mutations were examined at every possible amino acid position. The investigators found that 
"[m]ost of the molecule could be altered with little effect on either [binding or biological 
activity]." (See, Abstract.) In fact, only 23 unique amino acid sequences, out of more than 
3,500 nucleotide sequences examined, produced a protein that significantly differed in 

30 activity from wild-type. 

Furthermore, as discussed herein, even if deleting one or more amino acids from the 
N-terminus or C-terminus of a polypeptide results in modification or loss of one or more 
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biological functions, other biological activities may still be retained. For example, the ability 
of a deletion variant to induce and/or to bind antibodies which recognize the secreted form 
will likely be retained when less than the majority of the residues of the secreted form are 
removed from the N-terminus or C-terminus. Whether a particular polypeptide lacking N- or 
5 C-terminal residues of a protein retains such immunogenic activities can readily be 
determined by routine methods described herein and otherwise known in the art. 

Thus, the invention further includes polypeptide variants which show a functional 
activity (e.g., biological activity) of the polypeptide of the invention of which they are a 
variant. Such variants include deletions, insertions, inversions, repeats, and substitutions 
10 selected according to general rules known in the an so as have little effect on activity. 

The present application is directed to nucleic acid molecules at least 80%, 85%, 90%, 
95%, 96%, 97%, 98%o, 99% or 100% identical to the nucleic acid sequences disclosed herein 
or fragments thereof, (e.g., including but not limited to fragments encoding a polypeptide 
having the amino acid sequence of an N and/or C terminal deletion), irrespective of whether 

15 they encode a polypeptide having functional activity. This is because even where a particular 
nucleic acid molecule does not encode a polypeptide having functional activity, one of skill 
in the art would still know how to use the nucleic acid molecule, for instance, as a 
hybridization probe or a polymerase chain reaction (PCR) primer. Uses of the nucleic acid 
molecules of the present invention that do not encode a polypeptide having functional activity 

20 include, inter alia, (1) isolating a gene or allelic or splice variants thereof in a cDNA library; 
(2) in situ hybridization (e.g., "FISH") to metaphase chromosomal spreads to provide precise 
chromosomal location of the gene, as described in Verma et al., Human Chromosomes: A 
Manual of Basic Techniques, Pergamon Press, New York (1988); and (3) Northern Blot 
analysis for detecting mRNA expression in specific tissues. 

25 Preferred, however, are nucleic acid molecules having sequences at least 80%, 85%, 

90%, 95%, 96%, 97%, 98%, 99% or 100% identical to the nucleic acid sequences disclosed 
herein, which do, in fact, encode a polypeptide having a functional activity of a polypeptide 
of the invention. 

Of course, due to the degeneracy of the genetic code, one of ordinary skill in the art 
30 will immediately recognize that a large number of the nucleic acid molecules having a 
sequence at least 80%, 85%, 90%, 95%, 96%, 97%, 98%, 99%, or 100% identical to, for 
example, the nucleic acid sequence of the cDNA in the related cDNA clone contained in a 
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deposited library, the nucleic acid sequence referred to in Table I (SEQ ID NO:X), or 
fragments thereof, will encode polypeptides "having functional activity/ In fact, since 
degenerate variants of any of these nucleotide sequences all encode the same polypeptide, in 
many instances, this will be clear to the skilled artisan even without performing the above 
5 described comparison assay. It will be further recognized in the an that, for such nucleic acid 
molecules that are not degenerate variants, a reasonable number will also encode a 
polypeptide having functional activity. This is because the skilled artisan is fully aware of 
amino acid substitutions that are either less likely or not likely to significantly effect protein 
function (e.g., replacing one aliphatic amino acid with a second aliphatic amino acid), as 

1 0 further described below. 

For example, guidance concerning how to make phenotypically silent amino acid 
substitutions is provided in Bowie et aL "Deciphering the Message in Protein Sequences: 
Tolerance to Amino Acid Substitutions," Science 247:1306-1310 (1990), wherein the authors 
indicate that there are two main strategies for studying the tolerance of an amino acid 

15 sequence to change. 

The first strategy exploits the tolerance of amino acid substitutions by natural 
selection during the process of evolution. By comparing amino acid sequences in different 
species, conserved amino acids can be identified. These conserved amino acids are likely 
important for protein function. In contrast, the amino acid positions where substitutions have 

20 been tolerated by natural selection indicates that these positions are not critical for protein 
function. Thus, positions tolerating amino acid substitution could be modified while still 
maintaining biological activity of the protein. 

The second strategy uses genetic engineering to introduce amino acid changes at 
specific positions of a cloned gene to identify regions critical for protein function. For 

25 example, site directed mutagenesis or alanine-scanning mutagenesis (introduction of single 
alanine mutations at every residue in the molecule) can be used. (Cunningham and Wells, 
Science 244:1081-1085 (1989).) The resulting mutant molecules can then be tested for 
biological activity. 

As the authors state, these two strategies have revealed that proteins are surprisingly 
30 tolerant of amino acid substitutions. The authors further indicate which amino acid changes 
are likely to be permissive at certain amino acid positions in the protein. For example, most 
buried (within the tertiary structure of the protein) amino acid residues require nonpolar side 
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chains, whereas few features of surface side chains are generally conserved. Moreover, 
tolerated conservative amino acid substitutions involve replacement of the aliphatic or 
hydrophobic amino acids Ala, VaK Leu and lie; replacement of the hydroxy! residues Ser and 
Thr; replacement of the acidic residues Asp and Glu; replacement of the amide residues Asn 
5 and Gin. replacement of the basic residues Lys, Arg, and His; replacement of the aromatic 
residues Phe, Tyr, and Trp, and replacement of the small-sized amino acids Ala, Ser, Thr, 
Met, and Gly. Besides conservative amino acid substitution, variants of the present invention 
include (i) substitutions with one or more of the non-conserved amino acid residues, where 
the substituted amino acid residues may or may not be one encoded by the genetic code, or 

10 (ii) substitution with one or more of amino acid residues having a substituent group, or (iii) 
fusion of the mature polypeptide with another compound, such as a compound to increase the 
stability and/or solubility of the polypeptide (for example, polyethylene glycol), or (iv) fusion 
of the polypeptide with additional amino acids, such as, for example, an IgG Fc fusion region 
peptide, or leader or secretory sequence, or a sequence facilitating purification. Such variant 

15 polypeptides are deemed to be within the scope of those skilled in the art from the teachings 
herein. 

For example, polypeptide variants containing amino acid substitutions of charged 
amino acids with other charged or neutral amino acids may produce proteins with improved 
characteristics, such as less aggregation. Aggregation of pharmaceutical formulations both 

20 reduces activity and increases clearance due to the aggregate's immunogenic activity. 
(Pinckard et al., Clin. Exp. Immunol. 2:331-340 (1967); Robbins et al., Diabetes 36: 838-845 
(1987); Cleland et al., Crit. Rev. Therapeutic Drug Carrier Systems 10:307-377 (1993).) 

A further embodiment of the invention relates to a polypeptide which comprises the 
amino acid sequence of a polypeptide having an amino acid sequence which contains at least 

25 one amino acid substitution, but not more than 50 amino acid substitutions, even more 
preferably, not more than 40 amino acid substitutions, still more preferably, not more than 30 
amino acid substitutions, and still even more preferably, not more than 20 amino acid 
substitutions. Of course it is highly preferable for a polypeptide to have an amino acid 
sequence which comprises the amino acid sequence of a polypeptide of SEQ ID NO:Y, an 

30 amino acid sequence encoded by SEQ ID NO:X, and/or the amino acid sequence encoded by 
the cDNA in the related cDNA clone contained in a deposited library which contains, in order 
of ever-increasing preference, at least one, but not more than 10, 9, 8, 7, 6, 5, 4, 3, 2 or 1 
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amino acid substitutions. In specific embodiments, the number of additions, substitutions, 
and/or deletions in the amino acid sequence of SEQ ID NO:Y or fragments thereof (e.g., the 
mature form and/or other fragments described herein), an amino acid sequence encoded by 
SEQ ID NO:X or fragments thereof, and/or the amino acid sequence encoded by the cDNA in 
5 the related cDNA clone contained in a deposited library or fragments thereof, is 1-5, 5-10, 5- 
25, 5-50, 10-50 or 50-150, conservative amino acid substitutions are preferable. 

Polynucleotide and Polypeptide Fragments 

The present invention is also directed to polynucleotide fragments of the pancreas and 

10 pancreatic cancer polynucleotides (nucleic acids) of the invention. In the present invention, a 
"polynucleotide fragment" refers, for example, to a polynucleotide having a nucleic acid 
sequence which: is a portion of the cDNA contained in a depostied cDNA clone; or is a 
portion of a polynucleotide sequence encoding the polypeptide encoded by the cDNA 
contained in a deposited cDNA clone; or is a portion of the polynucleotide sequence in SEQ 

15 ID NO:X or the complementary strand thereto; or is a polynucleotide sequence encoding a 
portion of the polypeptide of SEQ ID NO:Y; or is a polynucleotide sequence encoding a 
portion of a polypeptide encoded by SEQ ID NO:X or the complementary strand thereto. 
The nucleotide fragments of the invention are preferably at least about 15 nt, and more 
preferably at least about 20 nt, still more preferably at least about 30 nt, and even more 

20 preferably, at least about 40 nt, at least about 50 nt, at least about 75 nt, at least about 100 nt, 
at least about 125 ht or at least about 150 nt in length. A fragment "at least 20 nt in length," 
for example, is intended to include 20 or more contiguous bases from, for example, the 
sequence contained in the cDNA in a related cDNA clone contained in a deposited Library, 
the nucleotide sequence shown in SEQ ID NO:X or the complementary stand thereto. In this 

25 context "about" includes the particularly recited value or a value larger or smaller by several 
(5, 4, 3, 2, or 1) nucleotides. These nucleotide fragments have uses that include, but are not 
limited to, as diagnostic probes and primers as discussed herein. Of course, larger fragments 
(e.g., at least 150, 175, 200, 250, 500, 600, 1000, or 2000 nucleotides in length) are also 
encompassed by the invention. 

30 tMoreover. representative examples of polynucleotide fragments of the invention, 

include, for example, fragments comprising, or alternatively consisting of, a sequence from 
about nucleotide number 1-50, 51-100, 101-150, 151-200, 201-250, 251-300, 301-350, 351- 
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400, 401-450, 451-500, 501-550, 551-600, 651-700,701- 750, 751-800, 800-850, 851-900, 
901-950, 951-1000, 1001-1050, 1051-1100, 1101-1150, 1151-1200, 1201-1250, 1251-1300, 
1301-1350, 1351-1400, 1401-1450, 1451-1500, 1501-1550, 1551-1600, 1601-1650, 1651- 
1700, 1701-1750, 1751-1800, 1801-1850, 1851-1900, 1901-1950, 1951-2000, 2001-2050, 
5 2051-2100, 2101-2150, 2151-2200, 2201-2250, 2251-2300, 2301-2350, 2351-2400, 2401- 
2450, 2451-2500, 2501-2550, 2551-2600, 2601-2650, 2651-2700, 2701-2750, 2751-2800, 
2801-2850, 2851-2900, 2901-2950, 2951-3000, 3001-3050, 3051-3100, 3101-3150, 3151- 
3200, 3201-3250, 3251-3300, 3301-3350, 3351-3400, 3401-3450, 3451-3500, 3501-3550, 
and 3551 to the end of SEQ ID NO:X, or the complementary strand thereto. In this context 

10 "abouf includes the particularly recited range or a range larger or smaller by several (5, 4, 3, 
2, or 1) nucleotides, at either terminus or at both termini. Preferably, these fragments encode 
a polypeptide which has a functional activity (e.g., biological activity) of the polypeptide 
encoded by the polynucleotide of which the sequence is a portion. More preferably, these 
fragments can be used as probes or primers as discussed herein. Polynucleotides which 

15 hybridize to one or more of these nucleic acid molecules under stringent hybridization 
conditions or alternatively, under lower stringency conditions, are also encompassed by the 
invention, as are polypeptides encoded by these polynucleotides or fragments. 

Moreover, representative examples of polynucleotide fragments of the invention, 
include, for example, fragments comprising, or alternatively consisting of, a sequence from 

20 about nucleotide number 1-50,51-100, 101-150, 151-200, 201-250, 251-300, 301-350, 351- 
400, 401-450, 451-500, 501-550, 551-600, 651-700,701- 750, 751-800, 800-850, 851-900, 
901-950, 951-1000, 1001-1050, 1051-1100, 1101-1150, 1151-1200, 1201-1250, 1251-1300, 
1301-1350, 1351-1400,' 1401-1450, 1451-1500, 1501-1550, 1551-1600, 1601-1650, 1651- 
1700, 1701-1750, 1751-1800, 1801-1850, 1851-1900, 1901-1950, 1951-2000,2001-2050, 

25 2051-2100, 2101-2150, 2151-2200, 2201-2250, 2251-2300, 2301-2350, 2351-2400, 2401- 
2450, 2451-2500, 2501-2550, 2551-2600, 2601-2650, 2651-2700, 2701-2750, 2751-2800, 
2801-2850, 2851-2900, 2901-2950, 2951-3000, 3001-3050, 3051-3100, 3101-3150, 3151- 
3200, 3201-3250, 3251-3300, 3301-3350, 3351-3400, 3401-3450, 3451-3500, 3501-3550, 
and 3551 to the end of the cDNA nucleotide sequence contained in the deposited cDNA 

30 clone, or the complementary strand thereto. In this context "about" includes the particularly 
recited range, or a range larger or smaller by several (5, 4, 3, 2, or 1) nucleotides, at either 
terminus or at both termini. Preferably, these fragments encode a polypeptide which has a 
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functional activity (e.g., biological activity) of the polypeptide encoded by the cDNA 
nucleotide sequence contained in the deposited cDNA clone. More preferably, these 
fragments can be used as probes or primers as discussed herein. Polynucleotides which 
hybridize to one or more of these fragments under stringent hybridization conditions or 
alternatively, under lower stringency conditions, are also encompassed by the invention, as 
are polypeptides encoded by these polynucleotides or fragments. 

In the present invention, a "polypeptide fragment" refers to an amino acid sequence 
which is a portion of that contained in SEQ ID NO:Y, a portion of an amino acid sequence 
encoded by the polynucleotide sequence of SEQ ID NO:X, and/or encoded by the cDNA 
contained in the related cDNA clone contained in a deposited library. Protein (polypeptide) 
fragments may be "free-standing," or comprised within a larger polypeptide of which the 
fragment forms a part or region, most preferably as a single continuous region. 
Representative examples of polypeptide fragments of the invention, include, for example, 
fragments comprising, or alternatively consisting of, an amino acid sequence from about 
amino acid number 1-20, 21-40, 41-60, 61 -80, 81-100, 102-120, 121-140, 141-160, 161-180, 
181-200, 201-220, 221-240, 241-260, 261-280, 281-300, 301-320, 321-340, 341-360, 361- 
380, 381-400, 401-420, 421-440, 441-460, 461-480, 481-500, 501-520, 521-540, 541-560, 
561-580, 581-600, 601-620, 621-640, 641-660, 661-680, 681-700, 701-720, 721-740, 741- 
760, 761-780, 781-800, 801-820, 821-840, 841-860, 861-880, 881-900, 901-920, 921-940, 
941-960, 961-980, 981-1000, 1001-1020, 1021-1040, 1041-1060, 1061-1080, 1081-1100, 
1101-1120, 1121-1140, 1141-1160, 1161-1180, and 1181 to the end of SEQ ID NO:Y. 
Moreover, polypeptide fragments of the invention may be at least about 10, 15, 20, 25, 30, 
35, 40, 45, 50, 55, 60, 65, 70, 75, 80, 85, 90, 100, 1 10, 120, 130, 140, or 150 amino acids in 
length. In this context "about" includes the particularly recited ranges or values, or ranges or 
values larger or smaller by several (5, 4, 3, 2, or 1) amino acids, at either terminus or at both 
termini. Polynucleotides encoding these polypeptide fragments are also encompassed by the 
invention. 

Even if deletion of one or more amino acids from the N-terminus of a protein results 
in modification of loss of one or more biological functions of the protein, other functional 
activities (e.g., biological activities, ability to multimerize, ability to bind a ligand) may still 
be retained. For example, the ability of shortened muteins to induce and/or bind to antibodies 
which recognize the complete or mature forms of the polypeptides generally will be retained 
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when less than the majority of the residues of the complete or mature polypeptide are 
removed from the N-terminus. Whether a particular polypeptide lacking N-terminal residues 
of a complete polypeptide retains such immunologic activities can readily be determined by 
routine methods described herein and otherwise known in the art. It is not unlikely that a 
5 mutein with a large number of deleted N-terminal amino acid residues may retain some 
biological or immunogenic activities. In fact, peptides composed of as few as six amino acid 
residues may often evoke an immune response. 

Accordingly, polypeptide fragments of the invention include the secreted protein as 
well as the mature form. Further preferred polypeptide fragments include the secreted protein 

10 or the mature form having a continuous series of deleted residues from the amino or the 
carboxy terminus, or both. For example, any number of amino acids, ranging from 1-60, can 
be deleted from the amino terminus of either the secreted polypeptide or the mature form. 
Similarly, any number of amino acids, ranging from 1-30, can be deleted from the carboxy 
terminus of the secreted protein or mature form. Furthermore, any combination of the above 

15 amino and carboxy terminus deletions are preferred. Similarly, polynucleotides encoding 
these polypeptide fragments are also preferred. 

The present invention further provides polypeptides having one or more residues 
deleted from the amino terminus of the amino acid sequence of a polypeptide disclosed 
herein (e.g., a polypeptide of SEQ ID NO:Y, a polypeptide encoded by the polynucleotide 

20 sequence contained in SEQ ID NO:X, and/or a polypeptide encoded by the cDNA contained 
in the related cDNA clone contained in a deposited library). In particular, N-terminal 
deletions may be described by the general formula m-q, where q is a whole integer 
representing the total number of amino acid residues in a polypeptide of the invention (e.g., 
the polypeptide disclosed in SEQ ID NO:Y), and m is defined as any integer ranging from 2 

25 to q-6. Polynucleotides encoding these polypeptides are also encompassed by the invention. 

Also as mentioned above, even if deletion of one or more amino acids from the 
C-terminus of a protein results in modification of loss of one or more biological functions of 
the protein, other functional activities (e.g., biological activities, ability to multimerize, 
ability to bind a ligand) may still be retained. For example the ability of the shortened mutein 

30 to induce and/or bind to antibodies which recognize the complete or mature forms of the 
polypeptide generally will be retained when less than the majority of the residues of the 
complete or mature polypeptide are removed from the C-terminus. Whether a particular 
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polypeptide lacking C-terminal residues of a complete polypeptide retains such immunologic 
activities can readily be determined by routine methods described herein and otherwise 
known in the art. It is not unlikely that a mutein with a large number of deleted C-terminal 
amino acid residues may retain some biological or immunogenic activities. In fact, peptides 
5 composed of as few as six amino acid residues may often evoke an immune response. 

Accordingly, the present invention further provides polypeptides having one or more 
residues from the carboxy terminus of the amino acid sequence of a polypeptide disclosed 
herein (e.g., a polypeptide of SEQ ID NO:Y, a polypeptide encoded by the polynucleotide 
sequence contained in SEQ ID NO:X. and/or a polypeptide encoded by the cDNA contained 
10 in deposited cDNA clone referenced in Table I). In particular, C-terminal deletions may be 
described by the general formula 1 -n, where n is any whole integer ranging from 6 to q-1, and 
where n corresponds to the position of an amino acid residue in a polypeptide of the 
invention. Polynucleotides encoding these polypeptides are also encompassed by the 
invention. 

15 In addition, any of the above described N- or C-terminal deletions can be combined to 

produce a N- and C-terminal deleted polypeptide. The invention also provides polypeptides 
having one or more amino acids deleted from both the amino and the carboxyl termini, which 
may be described generally as having residues m-n of a polypeptide encoded by SEQ ID 
NO:X (e.g., including, but not limited to, the preferred polypeptide disclosed as SEQ ID 

20 NO:Y), and/or the cDNA in the related cDNA clone contained in a deposited library, where n 
and m are integers as described above. Polynucleotides encoding these polypeptides are also 
encompassed by the invention. 

Any polypeptide sequence contained in the polypeptide of SEQ ID NO:Y, encoded by 
the polynucleotide sequences set forth as SEQ ID NO:X, or encoded by the cDNA in the 

25 related cDNA clone contained in a deposited library may be analyzed to determine certain 
preferred regions of the polypeptide. For example, the amino acid sequence of a polypeptide 
encoded by a polynucleotide sequence of SEQ ID NO:X, or the cDNA in a deposited cDNA 
clone may be analyzed using the default parameters of the DNASTAR computer algorithm 
(DNASTAR, Inc., 1228 S. Park St., Madison, Wl 53715 USA; http://www.dnastar.com/). 

30 Polypeptide regions that may be routinely obtained using the DNASTAR computer 

algorithm include, but are not limited to, Garnier-Robson alpha-regions, beta-regions, 
turn-regions, and coil-regions, Chou-Fasman alpha-regions, beta-regions, and turn-regions, 
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K.yte-DooIittle hydrophilic regions and hydrophobic regions, Eisenberg alpha- and 
beta-amphipathic regions, Karplus-Schuiz flexible regions, Emini surface-forming regions 
and Jameson-Wolf regions of high antigenic index. Among highly preferred polynucleotides 
of the invention in this regard are those that encode polypeptides comprising regions that 
5 combine several structural features, such as several (e.g., 1, 2, 3 or 4) of the features set out 
above. 

Additionally, Kyte-Doolittle hydrophilic regions and hydrophobic regions, Emini 
surface-forming regions, and Jameson-Wolf regions of high antigenic index (i.e., containing 
four or more contiguous amino acids having an antigenic index of greater than or equal to 

10 1.5, as identified using the default parameters of the Jameson-Wolf program) can routinely be 
used to determine polypeptide regions that exhibit a high degree of potential for antigenicity. 
Regions of high antigenicity are determined from data by DNASTAR analysis by choosing 
values which represent regions of the polypeptide which are likely to be exposed on the 
surface of the polypeptide in an environment in which antigen recognition may occur in the 

1 5 process of initiation of an immune response. 

Preferred polypeptide fragments of the invention are fragments comprising, or 
alternatively consisting of, an amino acid sequence that displays a functional activity of the 
polypeptide sequence of which the amino acid sequence is a fragment. 

By a polypeptide demonstrating a "functional activity" is meant, a polypeptide 

20 capable of displaying one or more known functional activities associated with a full-length 
(complete) protein of the invention. Such functional activities include, but are not limited to, 
biological activity, antigenicity [ability to bind (or compete with a polypeptide for binding) 
to an anti-polypeptide antibody], immunogenicity (ability to generate antibody which binds to 
a specific polypeptide of the invention), ability to form multimers with polypeptides of the 

25 invention, and ability to bind to a receptor or ligand for a polypeptide. 

Other preferred polypeptide fragments are biologically active fragments. Biologically 
active fragments are those exhibiting activity similar, but not necessarily identical, to an 
activity of the polypeptide of the present invention. The biological activity of the fragments 
may include an improved desired activity, or a decreased undesirable activity. 

30 In preferred embodiments, polypeptides of the invention comprise, or alternatively 

consist of, one, two, three, four, five or more of the antigenic fragments of the polypeptide of 
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SEQ ID NO:Y, or portions thereof. Polynucleotides encoding these polypeptides are also 
encompassed by the invention. 
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Table 4. 



Sequence/ 
Contig ID 


Epitope 


462108 


Preferred epitopes include those comprising a sequence shown in SEQ ID NO. 461 as 
residues: lie- 1 to Arg-9, Val-26 to Val-41, Met-46 to Cys-5h Trp-88 to Gln-93, Glu- 
124 to Trp-130. Glv-339 to Pro-344. 


503446 


Preferred epitopes include those comprising a sequence shown in SEQ ID NU. 4bl as 
residues: Leu-54 to Leu-60. 


507841 


Preferred epitopes include those comprising a sequence shown in SEQ ID NO. 463 as 
residues: Tyr-39 to Trp-44. _ . 


509287 


Preferred epitopes include those comprising a sequence shown in SEQ ID NO. 464 as 
residues: Arg-6 to Val-12, Thr-38 to Asn-43 t Arg-69 to Asp-74, Trp-87 to Lys-97. 
His- 1 36 to Met- 142. Aia-149 to Lvs-160. 


509672 


Preferred epitopes include those comprising a sequence shown in SEQ ID NO. 465 as 
residues: Ser-33 to Cvs-39. 


524112 


Preferred epitopes include those comprising a sequence shown in SEQ ID NO. 469 as 
residues- Asp-l to Glv-6. Pro-30 to Gly-40. Leu-46 to Asn-52, Asp-54 to Glv-61. 


525971 


Preferred epitopes include those comprising a sequence shown in SEQ ID NO. 470 as 
residues: Pro- 13 to Arp-21. Lcu-30 to Thr-35. Pro-43 to Ser-51 . 


527156 


Preferred epitopes include those comprising a sequence shown in SEQ ID NO. 471 as 
residues: A la- 2 to Pro-7. 


532502 


Preferred epitopes include those comprising a sequence shown in SEQ ID NO. 472 as 
residues: Lvs- 1 to Ser-6. 


533459 


Preferred epitopes include those comprising a sequence shown in SEQ ID NO. 473 as 
residues: Glv-l to Trp-7. He- 155 to Glv-l 63. 


533551 


Preferred epitopes include those comprising a sequence shown in SEQ ID NO. 474 as 
residues: Lvs- 1 5 to Leu-20. 


537850 


Preferred epitopes include those comprising a sequence shown in SEQ ID NO. 475 as 
residues: Ilc-43 to Leu-49. Cys-85 to Lvs-92. Phe-138 to Leu- 144. 


537925 


Preferred epitopes include those comprising a sequence shown in SEQ ID NO. 476 as 
residues: Gin- 17 to Ser-24. Ala-47 to Asn-52. 


540802 


Preferred epitopes include those comprising a sequence shown in SEQ ID NO. 479 as 
residues- Lcu-3 to Trp-9. Arc-20 to Phc-29. G)u-58 to Gln-65. 


540989 


Preferred epitopes include those comprising a sequence shown in SEQ ID NO. 480 as 
residues: Ser-52 to G!y-57. Thr-64 to Asn-70. 


540997 


Preferred epitopes include those comprising a sequence shown in SEQ ID NO. 481 as 
residues: lle-1 to Thr-l 1. 


548735 


Preferred epitopes include those comprising a sequence shown in SEQ ID NO. 482 as 
residues: Gin- 17 to Asn-22. Scr-38 to Pro-45. Asn-75 to Leu-84, Glu-97 to Pro-1 10. 


549709 


preferred epitopes include those comprising a sequence shown in SEQ ID NO. 483 as 
residues: Phe-65 to Trp-77. 


550007 


Preferred epitopes include those comprising a sequence shown in SEQ ID NO. 484 as 
residues: Ser-4 to Ser-13, Leu-22 to Cys-40, Gly-42 to Gly-50, Thr-88 to Glu-97, 
leu- 1 84 to Gin- 1 90. Pro-206 to Glv-2 1 1 . 


550118 


Preferred epitopes include those comprising a sequence shown in SEQ ID NO. 485 as 
residues: Gly-1 to Gly-7, Trp-10 to Mct-24. Gln-91 to Glv-98. 


550870 


Preferred epitopes include those comprising a sequence shown in SEQ ID NO. 487 as 
residues: Are-26 to Are-33. Gln-47 to Asn-52. Trp-6l to Ser-71. Gly-93 to Trp-100. 


553765 


Preferred epitopes include those comprising a sequence shown in SEQ ID NO. 489 as 
residues: Thr-8 to Thr-l 9. Arg-108 to Scr-1 15. Ser-1 17 to Arg-12S. Phe-143 to Tyr- 
155. Leu- 171 to Arc- 177, Asn- 182 to Glv-l 87. Glv-195 to Ser-200, Arg-232 to Thr- 
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What Is Claimed Is: 

I. An isolated nucleic acid molecule comprising a polynucleotide having 
a nucleotide sequence at least 95% identical to a sequence selected from the group 
5 consisting of: 

(a) a polynucleotide fragment of SEQ ID NO:X or a polynucleotide fragment 
of the cDNA sequence included in the related cDNA clone, which is hybridizable to 
SEQ ID NO:X; 

(b) a polynucleotide encoding a polypeptide fragment of SEQ ID NO:Y or a 
10 polypeptide fragment encoded by the cDNA sequence included in the related cDNA 

clone, which is hybridizable to SEQ ID NO:X; 

(c) a polynucleotide encoding a polypeptide fragment of a polypeptide 
encoded by SEQ ID NO:X or a polypeptide fragment encoded by the cDNA sequence 
included in the related cDNA clone, which is hybridizable to SEQ ID NO:X; 

15 (d) a polynucleotide encoding a polypeptide domain of SEQ ID NO:Y or a 

polypeptide domain encoded by the cDNA sequence included in the related cDNA 
clone, which is hybridizable to SEQ ID NO:X; 

(e) a polynucleotide encoding a polypeptide epitope of SEQ ID NO:Y or a 
polypeptide epitope encoded by the cDNA sequence included in the related cDNA 

20 clone, which is hybridizable to SEQ ID NO:X; 

(f) a polynucleotide encoding a polypeptide of SEQ ID NO;Y or the cDNA 
sequence included in the related cDNA clone, which is hybridizable to SEQ ID 
NO:X, having biological activity; 

(g) a polynucleotide which is a variant of SEQ ID NO:X; 

25 (h) a polynucleotide which is an allelic variant of SEQ ID NO:X; 

(i) a polynucleotide which encodes a species homologue of the SEQ ID 

NO:Y; 

(j) a polynucleotide capable of hybridizing under stringent conditions to any 
one of the polynucleotides specified in (a)-(i), wherein said polynucleotide does not 
30 hybridize under stringent conditions to a nucleic acid molecule having a nucleotide 
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sequence of only A residues or of only T residues. 

2. The isolated nucleic acid molecule of claim I. wherein the 
polynucleotide fragment comprises a nucleotide sequence encoding a protein. 

5 

3. The isolated nucleic acid molecule of claim I, wherein the 
polynucleotide fragment comprises a nucleotide sequence encoding the sequence 
identified as SEQ ID NO:Y or the polypeptide encoded by the cDNA sequence 
included in the related cDNA clone, which is hybridizable to SEQ ID NO:X. 

10 

4. The isolated nucleic acid molecule of claim 1, wherein the 
polynucleotide fragment comprises the entire nucleotide sequence of SEQ ID NO:X 
or the cDNA sequence included in the related cDNA clone, which is hybridizable to 
SEQ ID NO:X. 

15 

5. The isolated nucleic acid molecule of claim 2, wherein the nucleotide 
sequence comprises sequential nucleotide deletions from either the C-terminus or the 
N-terminus. 

20 6. The isolated nucleic acid molecule of claim 3, wherein the nucleotide 

sequence comprises sequential nucleotide deletions from either the C-terminus or the 
N-terminus. 

7. A recombinant vector comprising the isolated nucleic acid molecule of 
25 claim I. 

8. A method of making a recombinant host cell comprising the isolated 
nucleic acid molecule of claim 1. 

30 9. A recombinant host cell produced by ihe method of claim S. 
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1 0. The recombinant host cell of claim 9 comprising vector sequences. 

11. An isolated polypeptide comprising an amino acid sequence at least 
5 95% identical to a sequence selected from the group consisting of: 

(a) a polypeptide fragment of SEQ ID NO:Y or of the sequence encoded by 
the cDNA included in the related cDNA clone; 

(b) a polypeptide fragment of SEQ ID NO:Y or of the sequence encoded by 
the cDNA included in the related cDNA clone, having biological activity; 

10 (c) a polypeptide domain of SEQ ID NO:Y or of the sequence encoded by the 

cDNA included in the related cDNA clone; 

(d) a polypeptide epitope of SEQ ID NO:Y or of the sequence encoded by the 
cDNA included in the related cDNA clone; 

(e) a full length protein of SEQ ID NO:Y or of the sequence encoded by the 
15 cDNA included in the related cDNA clone; 

(f) a variant of SEQ ID NO:Y; 

(g) an allelic variant of SEQ ID NO:Y; or 

(h) a species homologue of the SEQ ID NO:Y. 

20 12. The isolated polypeptide of claim 11, wherein the full length protein 
comprises sequential amino acid deletions from either the C-terminus or the N- 
terminus. 

13. An isolated antibody that binds specifically to the isolated polypeptide 
25 of claim 11. 

14. A recombinant host cell that expresses the isolated polypeptide of 
claim 11. 

30 15. A method of making an isolated polypeptide comprising: 
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(a) culturing the recombinant host cell of claim 14 under conditions such that 
said polypeptide is expressed: and 

(b) recovering said polypeptide, 

1 6. The polypeptide produced by claim 1 5. 

1 7. A method for preventing, treating, or ameliorating a medical condition, 
comprising administering to a mammalian subject a therapeutically effective amount 
of the polypeptide of claim 1 1 or the polynucleotide of claim 1. 

IS. A method of diagnosing a pathological condition or a susceptibility to 
a pathological condition in a subject comprising: 

(a) determining the presence or absence of a mutation in the polynucleotide of 
claim I: and 

(b) diagnosing a pathological condition or a susceptibility to a pathological 
condition based on the presence or absence of said mutation. 

19. A method of diagnosing a pathological condition or a susceptibility to 
a pathological condition in a subject comprising: 

(a) determining the presence or amount of expression of the polypeptide of 
claim 1 1 in a biological sample; and 

(b) diagnosing a pathological condition or a susceptibility to a pathological 
condition based on the presence or amount of expression of the polypeptide. 

20. A method for identifying a binding partner to the polypeptide of claim 
1 1 comprising: 

(a) contacting the polypeptide of claim 1 1 with a binding partner; and 

(b) determining whether the binding partner effects an activity of the 
polypeptide. 
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2 1 . The gene corresponding to the cDNA sequence of SEQ ID NO:Y. 

22. A method of identifying an activity in a biological assay, wherein the 

method comprises: 
5 (a) expressing SEQ ID NO:X in a cell; 

(b) isolating the supernatant; 

(c) detecting an activity in a biological assay; and 

(d) identifying the protein in the supernatant having the activity. 

10 23. The product produced by the method of claim 20. 
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SEQUENCE LISTING 

<110> Craig Rosen, 
Steve Ruben 

<120> Human Pancreas and Pancreatic Cancer Associated Gene Sequences and 
Polypeptides " 

<130> PA105PCT 

<140> Unassigned 
<141> 2000-03-08 

<150> 60/124,270 
<151> 1999-03-12 

<160> 928 

<170> Patentln Ver. 2.0 

<210> 1 
<211> 565 
<212> DNA 

<213> Homo sapiens 
<400> 1 

aagcaatcaa ctggaactca acaagccttg agttctcaaa gggggtgtgg gaaggttctt 60 

atacatcttc tatgaagggc agtttatcag tcactaaatt acaaatacat aaaccctttg 120 

tctcaccaaa cctgctggga atgaatccta catatatatt tatatgtgtg caggctacat 180 

ggttttcatt atgctattga tkttaaaagr aaaaatttgg gaaagaacct ccatgtccac 240, 

catggggaac tggctaagtc atttatggtg ggaatggtga tcataaaaat tcattaaaaa 300 

tgaagactct atatacagct tgaatattcc ttatcgaaat gcttggggac cagaagtgtt 3 60 

tcagatcttt gcatattttc tgtttctcat ttcagttcac attttctaag catgatattc 420 

ctactgattt ccttgtagat attttgaaac agataagaaa ctcctccgtg gaaattactg 4 80 

ttaactaggg aaaagacatg tttttttctg ttcataaaag taatgcgagt tctttgtagc 540 
aaaacttgga aaatgcataa aagta 565 

<210> 2 
<211> 1691 
<212> DNA 

<213> Homo sapiens 
<220> 

<221> misc feature 
<222> (1093) 

<223> n equals a,t,g, or c 

gagaggaaac caagccgggc gccxcttgca atggagacgg tcatttcttc 60 

ccagctgtgg aaaatgagca tcctcaagag accccagaat ccaacaatag 120 

tccttcatga agtctcatcg ctgctatgac ctgattccca caagctccaa 180 

tttgatacgt ccctgcaggt gaagaaagct ttttttgctt tggtgactaa 240 
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cggtgtacga gctgcccctt tatgggatag taagaagcaa agttttgtgg gcatgctgac 300 
catcactgat ttcatcaata tcctgcaccg ctactataaa tcagccttgg tacagatcta 360 
tgagctagaa gaacacaaga tagaaacttg gagagaggtg tatctccagg actcctttaa 420 
accgcttgtc tgcatttctc ctaatgccag cttgtttgat gctgtctctt cattaattcg 480 
gaacaagatc cacaggctgc cagttattga cccagaatca ggcaatactt tgtacatcct 540 
cacccacaag cgcattctga agttcctcaa attgtttatc actgagttcc ccaagccaga 600 
gttcatgtcc aagtctctgg aagagctaca gattggcacc tatgccaata ttgctatggt 660 
tcgcactacc acccccgtct atgtggctct ggggattttt gtacagcatc gagtctcagc 720 
cctgccagtg gtggatgaga aggggcgtgt ggtggacatc tactccaagt ttgatgttat 780 
caatctggca gcagaaaaga cctacaacaa cctagatgta tctgtgacta aagccttgca 840 
acatcgatca cattactttg agggtgttct caagtgctac ctgcatgaga ctctggagac 900 
catcatcaac aggctagtgg aagcagaggt tcaccgactt gtagtggtgg atgaaaatga 960 
tgtggtcaag ggaattgtat cactgtctga catcctgcag gccctggtgc tcacaggtgg 1020 
agagaagaag ccctgagctg ggggaagggg tcatgcagca ccaggggata tgcccaactc 1080 
actgcctgct ggnaactctg tgggaatcag atgaaacttg agggaattgt gactctgttc 1140 
cctgttcagg gtcccctgcc cttctatctg ggagctaggg aaggtatggg ggaggaaaga 1200 
gaatggattt atagctaccc ttaccctcac acatacactt gaaaaaactt tcagcctagc 1260 
cagttctagc ccctgtcctc ttagatatat ccccctttct gggtgaacta taggctctgt 1320 
gcctctcaga caaattctga tctctaagag atccccagac ctcacttgcc tctgcctcca 1380 
tcttggccct gattcaaccc taagataata gcacaacaaa attcttcata aagatatttt 1440 
tattcacctg ttccgtgcta tatggaggag gccaagtcca tttagtgaca tttcttccca 1500 
taatgtgagt ggggaggatt gtggggagga ggggctttgg gttcctgtgt ttgtgcatat 1560 
gaagggagat gggggttagg tggaggagga gagcagcgtg gttagctaag gttattgctt 1620 
tttgtggcaa atctaattaa atgacaggaa tctcttcaaa aaaaaaaaaa aaaaaaaaaa 1680 
aaaaaaaaaa a 1691 

<210> 3 

<211> 480 

<212> DNA 

<213> Homo sapiens 

<220> 

<221> misc feature 
<222> (471) 

<223> n equals a, tig, or c 
<400> 3 

agctttagat ttggaggttt tgcggtactg aggggagttg ggggaagaat ggaaatacca 60 
ttgacctcct aaaaagttgt tacttgcaaa gtttgggagg tgacatcaaa aactcaactg 120 
cccttacaat agtcattcca tccatctgtt gcttatttga attctcattt atttttactt 180 
tatggcatta aaatacaata aatctgtcaa ttatgtattt tatattagta gtagcttaag 24 0 
attgggtcac ttcatttcgg tagatataat tgttagtatt atccttcagg acaaaaagca 300 
tctgctaaca acctgtggtt taaaatatag gccaacttta tgttcaaaca ttatgttgat 360 
aatattttta gcagtattac acagtggagg tccaaattgg attagacttt tgcattgatt 420 
ccaagtttgg taagctagca caactwtaag tc-tgctaaa tcttgtgggg nacatatttt 480 

<210> 4 

<211> 608 

<212> DNA 

<213> Homo sapiens 
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<220> 

<221> misc feature 
<222> (564) 

<223> n equals a,t,g, or c 
<220> 

<221> misc feature 
<222> (578) 

<223> n equals a,t,g, or c 
<220> 

<221> misc feature 
<222> (582) 

<223> n equals a,t,g, or c 



<400> 4 

ctaatttctt gtcctatgga agtyctcgct 
aatatgttgg tacagataag agttagtcac 
aaatggtact ggrtcttagt ttctgtgcag 
gagtccactt gcttctggac caattctgtt 
gcaatggacc tagttcacag taataggtgc 
aaccctttgc agtcgtcata cttagctgct 
gtctggaggc aaagggctgt tttttagtat 
aacttttatt tttagatatt ataagcatac 
agagatttat ggtagagaat ggacgacatt 
tattttaaaa agacaaataa tcmnctggac 
gaaatagg 



gtctccatct ctctcatttt tgtgtcacct 60 
attttyctga ctgcatcaaa cttttatttg 120 
aatattctgt gaattttggg aaatgtaagt 180 
tcatgtatgt tagcatccta gaaacaccta 240 
aaagaaagac caaatggact ttgcagtatt 300 
gcctgtaatg ctaaaatgat tttaatggtt 360 
attgccacta aaggacattt atttatatca 420 
agtacataat tgatgaaatt gatatttact 480 
caataactgg gagcccgaga ttgtycactt 540 
aagacagnca cngttggcca ttataaggga 600 

608 



<210> 5 
<211> 696 
<212> DNA 

<213> Homo sapiens 



<400> 5 

ggctttacgg ctgcgagaag acgacagaag 
ggtggcgttt tggtgtcttc gtttgttatg 
gctggaaatc gacttcggtt tcagttggag 
aattacctta attttcttgc ccaaagaggt 
cttaaatact tgctttactg gaaagaccca 
tgtttacaca tgttagagct gctccaatat 
cagtgtgcga aatttattga tgaacagcag 
cggatgcgcc ttcagcaagc cttggcagag 
tgaaaaactg gatacaaacg aggcacttaa 
gtgaagacaa aaaaaatgaa aactcttcct 
ctttagtgtg ctttttcatc caacttttta 
attgttcttt gatttccctg tggattrgta 



ggggtctctg ggcttttgct ctgtcaggct 60 
gccgctgctg tcgctatgga gacagatgat 120 
ttggaatttg tgcaatgttt agccaaccca 180 
tacttcaaag acaaagcttt tgttaattat 240 
gaatatgcca agtatctaaa gtaccctcag 300 
gaacacttcc gaaaggagct ggtgaatgct 360 
attctacatt ggcagcacta ttcccggaag 420 
cagcaacagc aaaataacac atcgggaaaa 480 
tacatgtata taatgtattt cttttgtaca 540 
atctaccttt attatggtag cccctagaac 600 
ttgttaatag actatttctg ttaaacctga 660 
aggtgt 696 



<210> 6 
<211> 292 
<212> DNA 

<213> Homo sapiens 
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<222> (575) 

<223> n equals a,t,g, or c 



<400> 306 

gcggacgtgg gcaggagggc tggaaaagcc 
tgggcgccaa aggccgcggc actcccacgc 
cccgcggctg craggggatc ttctctggat 
gggcaaaaaa cgacacagta gtagcagttc 
gtctgtggat tctagccttg ggggtctttc 
agattccacc aaaagctcag gacaaagcaa 
aataaaatat gttggtgcca ttgagaaact 
gccattagac ctgataaatt atatagacgt 
tcctccggag gaagaattta ttatgggagt 
cagrtcaata tgtaagtnat ataatttatt 
tcaggccatt aagagccccc ttataattag 
gtaaacat 



ggcgctggag cgggaacggg agtagctgcc 60 
ggaccccgaa gtccgcaacc cggggatggg 120 
caagcaatgg tggtgaaaaa tgtttcgcaa 180 
ccaaagtagc gaaatcagta ctaagagcaa 240 
acgatccagc actgtggcca gcctcgacac 300 
caataattca gatacctgtg cagaatttcg 360 
gaaactctcc gagggaaaag gccttgaagg 420 
tgcccagcaa gatggaaagt tgccttttgt 480 
ttccaagtat ggcataaaag tattcaacat 540 
aaganaacta tgttttagat aacagggaat 600 
ggccactcct gtttgcagag tgattggttt 660 

668 



<210> 307 

<211> 1046 

<212> DNA 

<213> Homo sapiens 



<220> 

<221> misc feature 
<222> (4) 

<223> n equals a,t,g, or c 
<220> 

<221> misc feature 
<222> (14) 

<223> n equals a,t,g, or c 
<220> 

<221> misc feature 
<222> (946) 

<223> n equals a,t,g, or c 
<220> 

<221> raise feature 
<222> (948) 

<223> n equals a,t,g, or c 



<400> 307 

eggnacgegt gggncggacg cgtggggttt 
gaagcataca taaataaatg aagtaageca 
cctaagacct gaaaatgaac atagtatget 
cctcacacaa tttggaatca tataatatag 
gatagaatgc atcaagtgtt tattacgaaa 
aggtgtttgc ccattctaag aaatgagega 
tgttaggtgg agtgtatgtg ttgacattrc 
cattatttga ataaagtgac tgctgaagat, 



tgaatgttca tgtatgaatg ctgcagctgt 60 
tactgattta atttattgga tgttattttc 120 
agttattttt cagtgttagc cttttacttt 180 
gtactttgrc cctgartaaa taatgtgacg 240 
agagtggaaa agtatatagc ttttagcaaa 300 
atatatagaa atagtgtggg catttcttcc 360 
tccccatctc ttcccactct gttttctccc 420 
gactttgaat ccttatccac ttaatttaat 480 
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gtttaaagaa aaacctgtaa tggaaagtra gactccttcc ctaatttcag tttagagcaa 540 

cttgaagaag agtagacaaa aaataaaatg cacatagaaa aagagaaaaa gggcacaaag 600 

ggattggccc aatattgatt cttttttata aaacctcctt tggcttagaa ggaatgactc 660 

tagctacaat aatacacagt atgtttaagc aggttccctt ggttgttgca ttaaatgtaa 720 

tccaccttta ggtattttag agcacagaac aacact.gt.gt tgatctagta ggtttctatt 780 

tttcctttct ctttacaatg cacataatac tttcctgtat ttatatcata acgtgtatag 840 

tgtaaaatgt gaatgacttt ttttgtgaat gaaaatctaa aatctttgta actttttata 900 

tctgcttttg tttcaccaaa gaaacctaaa atccttcttt tamwananaa aaaaaaaaaa 960 

aaaaaaaaaa aaaaaaaaaa aaaaaaaaaa aaaaaaaaaa aaaaaaaaaa aaaaaaaaaa 1020 

aaaaaaaaaa aagggcggcc gtttta 1046 

<210> 308 

<211> 1686 

<212> DNA 

<213> Horao sapiens 

<220> 

<221> misc feature 
<222> (7) 

<223> n equals a,t,g, or c 
<220> 

<221> misc feature 
<222> (29) 

<223> n equals a,t,g, or c 
<220> 

<221> misc feature 
<222> (39) 

<223> n equals a,t,g, or c 
<220> 

<221> misc feature 
<222> (117) 

<223> n equals a,t,g, or c 
<220> 

<221> misc feature 
<222> (1522) 

<223> n equals a,t,g, or c 
<220> 

<221> misc feature 
<222> (1551) 

<223> n equals a,t,g, or c 
<220> 

<221> misc feature 
<222> (1627) 

<223> n equals a,t,g, or c 
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